We propose StyleCap, a method to generate natural language descriptions of
speaking styles appearing in speech. Although most of conventional techniques
for para-/non-linguistic information recognition focus on the category
classification or the intensity estimation of pre-defined labels, they cannot
provide the reasoning of the recognition result in an interpretable manner.
StyleCap is a first step towards an end-to-end method for generating
speaking-style prompts from speech, i.e., automatic speaking-style captioning.
StyleCap is trained with paired data of speech and natural language
descriptions. We train neural networks that convert a speech representation
vector into prefix vectors that are fed into a large language model (LLM)-based
text decoder. We explore an appropriate text decoder and speech feature
representation suitable for this new task. The experimental results demonstrate
that our StyleCap leveraging richer LLMs for the text decoder, speech
self-supervised learning (SSL) features, and sentence rephrasing augmentation
improves the accuracy and diversity of generated speaking-style captions.
Samples of speaking-style captions generated by our StyleCap are publicly
available.
( 2
min )
In this note, we consider the highly nonconvex optimization problem
associated with computing the rank decomposition of symmetric tensors. We
formulate the invariance properties of the loss function and show that critical
points detected by standard gradient based methods are \emph{symmetry breaking}
with respect to the target tensor. The phenomena, seen for different choices of
target tensors and norms, make possible the use of recently developed analytic
and algebraic tools for studying nonconvex optimization landscapes exhibiting
symmetry breaking phenomena of similar nature.
( 2
min )
Despite the breakthroughs in biomarker discovery facilitated by differential
gene analysis, challenges remain, particularly at the single-cell level.
Traditional methodologies heavily rely on user-supplied cell annotations,
focusing on individually expressed data, often neglecting the critical
interactions between biological conditions, such as healthy versus diseased
states. In response, here we introduce scBeacon, an innovative framework built
upon a deep contrastive siamese network. scBeacon pioneers an unsupervised
approach, adeptly identifying matched cell populations across varied
conditions, enabling a refined differential gene analysis. By utilizing a
VQ-VAE framework, a contrastive siamese network, and a greedy iterative
strategy, scBeacon effectively pinpoints differential genes that hold potential
as key biomarkers. Comprehensive evaluations on a diverse array of datasets
validate scBeacon's superiority over existing single-cell differential gene
analysis tools. Its precision and adaptability underscore its significant role
in enhancing diagnostic accuracy in biomarker discovery. With the emphasis on
the importance of biomarkers in diagnosis, scBeacon is positioned to be a
pivotal asset in the evolution of personalized medicine and targeted
treatments.
( 2
min )
Federated bilevel optimization (FBO) has shown great potential recently in
machine learning and edge computing due to the emerging nested optimization
structure in meta-learning, fine-tuning, hyperparameter tuning, etc. However,
existing FBO algorithms often involve complicated computations and require
multiple sub-loops per iteration, each of which contains a number of
communication rounds. In this paper, we propose a simple and flexible FBO
framework named SimFBO, which is easy to implement without sub-loops, and
includes a generalized server-side aggregation and update for improving
communication efficiency. We further propose System-level heterogeneity robust
FBO (ShroFBO) as a variant of SimFBO with stronger resilience to heterogeneous
local computation. We show that SimFBO and ShroFBO provably achieve a linear
convergence speedup with partial client participation and client sampling
without replacement, as well as improved sample and communication complexities.
Experiments demonstrate the effectiveness of the proposed methods over existing
FBO algorithms.
( 2
min )
The rising popularity of artificial intelligence in healthcare is
highlighting the problem that a computational model achieving super-human
clinical performance at its training sites may perform substantially worse at
new sites. In this perspective, we present common sources for this failure to
transport, which we divide into sources under the control of the experimenter
and sources inherent to the clinical data-generating process. Of the inherent
sources we look a little deeper into site-specific clinical practices that can
affect the data distribution, and propose a potential solution intended to
isolate the imprint of those practices on the data from the patterns of disease
cause and effect that are the usual target of probabilistic clinical models.
( 2
min )
We present a new high-level synthesis methodology for using large language
model tools to generate hardware designs. The methodology uses exclusively
open-source tools excluding the large language model. As a case study, we use
our methodology to generate a permuted congruential random number generator
design with a wishbone interface. We verify the functionality and quality of
the random number generator design using large language model-generated
simulations and the Dieharder randomness test suite. We document all the large
language model chat logs, Python scripts, Verilog scripts, and simulation
results used in the case study. We believe that our method of hardware design
generation coupled with the open source silicon 130 nm design tools will
revolutionize application-specific integrated circuit design. Our methodology
significantly lowers the bar to entry when building domain-specific computing
accelerators for the Internet of Things and proof of concept prototypes for
later fabrication in more modern process nodes.
( 2
min )
To benefit from the modeling capacity of deep models in system
identification, without worrying about inference time, this study presents a
novel training strategy that uses deep models only at the training stage. For
this purpose two separate models with different structures and goals are
employed. The first one is a deep generative model aiming at modeling the
distribution of system output(s), called the teacher model, and the second one
is a shallow basis function model, named the student model, fed by system
input(s) to predict the system output(s). That means these isolated paths must
reach the same ultimate target. As deep models show a great performance in
modeling of highly nonlinear systems, aligning the representation space learned
by these two models make the student model to inherit the approximation power
of the teacher model. The proposed objective function consists of the objective
of each student and teacher model adding up with a distance penalty between the
learned latent representations. The simulation results on three nonlinear
benchmarks show a comparative performance with examined deep architectures
applied on the same benchmarks. Algorithmic transparency and structure
efficiency are also achieved as byproducts.
( 3
min )
This report summarizes the 4th International Verification of Neural Networks
Competition (VNN-COMP 2023), held as a part of the 6th Workshop on Formal
Methods for ML-Enabled Autonomous Systems (FoMLAS), that was collocated with
the 35th International Conference on Computer-Aided Verification (CAV).
VNN-COMP is held annually to facilitate the fair and objective comparison of
state-of-the-art neural network verification tools, encourage the
standardization of tool interfaces, and bring together the neural network
verification community. To this end, standardized formats for networks (ONNX)
and specification (VNN-LIB) were defined, tools were evaluated on equal-cost
hardware (using an automatic evaluation pipeline based on AWS instances), and
tool parameters were chosen by the participants before the final test sets were
made public. In the 2023 iteration, 7 teams participated on a diverse set of 10
scored and 4 unscored benchmarks. This report summarizes the rules, benchmarks,
participating tools, results, and lessons learned from this iteration of this
competition.
( 2
min )
Though there has been substantial progress in developing quantum algorithms
to study classical datasets, the cost of simply \textit{loading} classical data
is an obstacle to quantum advantage. When the amplitude encoding is used,
loading an arbitrary classical vector requires up to exponential circuit depths
with respect to the number of qubits. Here, we address this ``input problem''
with two contributions. First, we introduce a circuit compilation method based
on tensor network (TN) theory. Our method -- AMLET (Automatic Multi-layer
Loader Exploiting TNs) -- proceeds via careful construction of a specific TN
topology and can be tailored to arbitrary circuit depths. Second, we perform
numerical experiments on real-world classical data from four distinct areas:
finance, images, fluid mechanics, and proteins. To the best of our knowledge,
this is the broadest numerical analysis to date of loading classical data into
a quantum computer. The required circuit depths are often several orders of
magnitude lower than the exponentially-scaling general loading algorithm would
require. Besides introducing a more efficient loading algorithm, this work
demonstrates that many classical datasets are loadable in depths that are much
shorter than previously expected, which has positive implications for speeding
up classical workloads on quantum computers.
( 3
min )
We propose INFAMOUS-NeRF, an implicit morphable face model that introduces
hypernetworks to NeRF to improve the representation power in the presence of
many training subjects. At the same time, INFAMOUS-NeRF resolves the classic
hypernetwork tradeoff of representation power and editability by learning
semantically-aligned latent spaces despite the subject-specific models, all
without requiring a large pretrained model. INFAMOUS-NeRF further introduces a
novel constraint to improve NeRF rendering along the face boundary. Our
constraint can leverage photometric surface rendering and multi-view
supervision to guide surface color prediction and improve rendering near the
surface. Finally, we introduce a novel, loss-guided adaptive sampling method
for more effective NeRF training by reducing the sampling redundancy. We show
quantitatively and qualitatively that our method achieves higher representation
power than prior face modeling methods in both controlled and in-the-wild
settings. Code and models will be released upon publication.
( 2
min )
The use of Mixed-Integer Linear Programming (MILP) models to represent neural
networks with Rectified Linear Unit (ReLU) activations has become increasingly
widespread in the last decade. This has enabled the use of MILP technology to
test-or stress-their behavior, to adversarially improve their training, and to
embed them in optimization models leveraging their predictive power. Many of
these MILP models rely on activation bounds. That is, bounds on the input
values of each neuron. In this work, we explore the tradeoff between the
tightness of these bounds and the computational effort of solving the resulting
MILP models. We provide guidelines for implementing these models based on the
impact of network structure, regularization, and rounding.
( 2
min )
This study presents a novel approach to addressing the challenge of missing
data in multivariate time series, with a particular focus on the complexities
of healthcare data. Our Conditional Self-Attention Imputation (CSAI) model,
grounded in a transformer-based framework, introduces a conditional hidden
state initialization tailored to the intricacies of medical time series data.
This methodology diverges from traditional imputation techniques by
specifically targeting the imbalance in missing data distribution, a crucial
aspect often overlooked in healthcare datasets. By integrating advanced
knowledge embedding and a non-uniform masking strategy, CSAI adeptly adjusts to
the distinct patterns of missing data in Electronic Health Records (EHRs).
( 2
min )
The recycling of waste electrical and electronic equipment is an essential
tool in allowing for a circular economy, presenting the potential for
significant environmental and economic gain. However, traditional material
separation techniques, based on physical and chemical processes, require
substantial investment and do not apply to all cases. In this work, we
investigate using an image classification neural network as a potential means
to control an automated material separation process in treating smartphone
waste, acting as a more efficient, less costly, and more widely applicable
alternative to existing tools. We produced a dataset with 1,127 images of
pyrolyzed smartphone components, which was then used to train and assess a
VGG-16 image classification model. The model achieved 83.33% accuracy, lending
credence to the viability of using such a neural network in material
separation.
( 2
min )
Curriculum learning and imitation learning have been leveraged extensively in
the robotics domain. However, minimal research has been done on leveraging
these ideas on control tasks over highly stochastic time-series data. Here, we
theoretically and empirically explore these approaches in a representative
control task over complex time-series data. We implement the fundamental ideas
of curriculum learning via data augmentation, while imitation learning is
implemented via policy distillation from an oracle. Our findings reveal that
curriculum learning should be considered a novel direction in improving
control-task performance over complex time-series. Our ample random-seed
out-sample empirics and ablation studies are highly encouraging for curriculum
learning for time-series control. These findings are especially encouraging as
we tune all overlapping hyperparameters on the baseline -- giving an advantage
to the baseline. On the other hand, we find that imitation learning should be
used with caution.
( 2
min )
Personalized Federated Learning (PFL) relies on collective data knowledge to
build customized models. However, non-IID data between clients poses
significant challenges, as collaborating with clients who have diverse data
distributions can harm local model performance, especially with limited
training data. To address this issue, we propose FedACS, a new PFL algorithm
with an Attention-based Client Selection mechanism. FedACS integrates an
attention mechanism to enhance collaboration among clients with similar data
distributions and mitigate the data scarcity issue. It prioritizes and
allocates resources based on data similarity. We further establish the
theoretical convergence behavior of FedACS. Experiments on CIFAR10 and FMNIST
validate FedACS's superiority, showcasing its potential to advance personalized
federated learning. By tackling non-IID data challenges and data scarcity,
FedACS offers promising advances in the field of personalized federated
learning.
( 2
min )
Exploring whether Enriched Category Theory could provide the foundation of an
alternative approach to Machine Learning. This paper is the first to construct
and motivate a Machine Learning algorithm solely with Enriched Category Theory.
In order to supplement evidence that Category Theory can be used to motivate
robust and explainable algorithms, it is shown that a series of reasonable
assumptions about a dataset lead to the construction of the Nearest Neighbours
Algorithm. In particular, as an extension of the original dataset using
profunctors in the category of Lawvere metric spaces. This leads to a
definition of an Enriched Nearest Neighbours Algorithm, which consequently also
produces an enriched form of the Voronoi diagram. This paper is intended to be
accessible without any knowledge of Category Theory
( 2
min )
Particle-based Variational Inference (ParVI) methods approximate the target
distribution by iteratively evolving finite weighted particle systems. Recent
advances of ParVI methods reveal the benefits of accelerated position update
strategies and dynamic weight adjustment approaches. In this paper, we propose
the first ParVI framework that possesses both accelerated position update and
dynamical weight adjustment simultaneously, named the General Accelerated
Dynamic-Weight Particle-based Variational Inference (GAD-PVI) framework.
Generally, GAD-PVI simulates the semi-Hamiltonian gradient flow on a novel
Information-Fisher-Rao space, which yields an additional decrease on the local
functional dissipation. GAD-PVI is compatible with different dissimilarity
functionals and associated smoothing approaches under three information
metrics. Experiments on both synthetic and real-world data demonstrate the
faster convergence and reduced approximation error of GAD-PVI methods over the
state-of-the-art.
( 2
min )
Randomized smoothing is currently the state-of-the-art method that provides
certified robustness for deep neural networks. However, due to its excessively
conservative nature, this method of incomplete verification often cannot
achieve an adequate certified radius on real-world datasets. One way to obtain
a larger certified radius is to use an input-specific algorithm instead of
using a fixed Gaussian filter for all data points. Several methods based on
this idea have been proposed, but they either suffer from high computational
costs or gain marginal improvement in certified radius. In this work, we show
that by exploiting the quasiconvex problem structure, we can find the optimal
certified radii for most data points with slight computational overhead. This
observation leads to an efficient and effective input-specific randomized
smoothing algorithm. We conduct extensive experiments and empirical analysis on
CIFAR-10 and ImageNet. The results show that the proposed method significantly
enhances the certified radii with low computational overhead.
( 2
min )
We consider the optimization problem associated with fitting two-layer ReLU
networks with respect to the squared loss, where labels are assumed to be
generated by a target network. Focusing first on standard Gaussian inputs, we
show that the structure of spurious local minima detected by stochastic
gradient descent (SGD) is, in a well-defined sense, the \emph{least loss of
symmetry} with respect to the target weights. A closer look at the analysis
indicates that this principle of least symmetry breaking may apply to a broader
range of settings. Motivated by this, we conduct a series of experiments which
corroborate this hypothesis for different classes of non-isotropic non-product
distributions, smooth activation functions and networks with a few layers.
( 2
min )
Inverse reinforcement learning (IRL) usually assumes the model of the reward
function is pre-specified and estimates the parameter only. However, how to
determine a proper reward model is nontrivial. A simplistic model is less
likely to contain the real reward function, while a model with high complexity
leads to substantial computation cost and risks overfitting. This paper
addresses this trade-off in IRL model selection by introducing the structural
risk minimization (SRM) method from statistical learning. SRM selects an
optimal reward function class from a hypothesis set minimizing both estimation
error and model complexity. To formulate an SRM scheme for IRL, we estimate
policy gradient by demonstration serving as empirical risk and establish the
upper bound of Rademacher complexity of hypothesis classes as model penalty.
The learning guarantee is further presented. In particular, we provide explicit
SRM for the common linear weighted sum setting in IRL. Simulations demonstrate
the performance and efficiency of our scheme.
( 2
min )
We present convincing empirical results on the application of Randomized
Signature Methods for non-linear, non-parametric drift estimation for a
multi-variate financial market. Even though drift estimation is notoriously ill
defined due to small signal to noise ratio, one can still try to learn optimal
non-linear maps from data to future returns for the purposes of portfolio
optimization. Randomized Signatures, in contrast to classical signatures, allow
for high dimensional market dimension and provide features on the same scale.
We do not contribute to the theory of Randomized Signatures here, but rather
present our empirical findings on portfolio selection in real world settings
including real market data and transaction costs.
( 2
min )
Deep learning algorithms, especially Transformer-based models, have achieved
significant performance by capturing long-range dependencies and historical
information. However, the power of convolution has not been fully investigated.
Moreover, most existing works ignore the dynamic interaction among variables
and evolutionary noise in series. Addressing these issues, we propose a
Hierarchical Memorizing Network (HMNet). In particular, a hierarchical
convolution structure is introduced to extract the information from the series
at various scales. Besides, we propose a dynamic variable interaction module to
learn the varying correlation and an adaptive denoising module to search and
exploit similar patterns to alleviate noises. These modules can cooperate with
the hierarchical structure from the perspective of fine to coarse grain.
Experiments on five benchmarks demonstrate that HMNet significantly outperforms
the state-of-the-art models by 10.6% on MSE and 5.7% on MAE. Our code is
released at https://github.com/yzhHoward/HMNet.
( 2
min )
In this paper, we present XuanCe, a comprehensive and unified deep
reinforcement learning (DRL) library designed to be compatible with PyTorch,
TensorFlow, and MindSpore. XuanCe offers a wide range of functionalities,
including over 40 classical DRL and multi-agent DRL algorithms, with the
flexibility to easily incorporate new algorithms and environments. It is a
versatile DRL library that supports CPU, GPU, and Ascend, and can be executed
on various operating systems such as Ubuntu, Windows, MacOS, and EulerOS.
Extensive benchmarks conducted on popular environments including MuJoCo, Atari,
and StarCraftII multi-agent challenge demonstrate the library's impressive
performance. XuanCe is open-source and can be accessed at
https://github.com/agi-brain/xuance.git.
( 2
min )
Recent work found high mutual information between the learned representations
of large language models (LLMs) and the geospatial property of its input,
hinting an emergent internal model of space. However, whether this internal
space model has any causal effects on the LLMs' behaviors was not answered by
that work, led to criticism of these findings as mere statistical correlation.
Our study focused on uncovering the causality of the spatial representations in
LLMs. In particular, we discovered the potential spatial representations in
DeBERTa, GPT-Neo using representational similarity analysis and linear and
non-linear probing. Our casual intervention experiments showed that the spatial
representations influenced the model's performance on next word prediction and
a downstream task that relies on geospatial information. Our experiments
suggested that the LLMs learn and use an internal model of space in solving
geospatial related tasks.
( 2
min )
Leveraging knowledge from multiple tasks through introducing a small number
of task specific parameters into each transformer layer, also known as
adapters, receives much attention recently. However, adding an extra fusion
layer to implement knowledge composition not only increases the inference time
but also is non-scalable for some applications. To avoid these issues, we
propose a two-stage knowledge distillation algorithm called
AdapterDistillation. In the first stage, we extract task specific knowledge by
using local data to train a student adapter. In the second stage, we distill
the knowledge from the existing teacher adapters into the student adapter to
help its inference. Extensive experiments on frequently asked question
retrieval in task-oriented dialog systems validate the efficiency of
AdapterDistillation. We show that AdapterDistillation outperforms existing
algorithms in terms of accuracy, resource consumption and inference time.
( 2
min )
This paper analyses LightGCN in the context of graph recommendation
algorithms. Despite the initial design of Graph Convolutional Networks for
graph classification, the non-linear operations are not always essential.
LightGCN enables linear propagation of embeddings, enhancing performance. We
reproduce the original findings, assess LightGCN's robustness on diverse
datasets and metrics, and explore Graph Diffusion as an augmentation of signal
propagation in LightGCN.
( 2
min )
We present Mini-BEHAVIOR, a novel benchmark for embodied AI that challenges
agents to use reasoning and decision-making skills to solve complex activities
that resemble everyday human challenges. The Mini-BEHAVIOR environment is a
fast, realistic Gridworld environment that offers the benefits of rapid
prototyping and ease of use while preserving a symbolic level of physical
realism and complexity found in complex embodied AI benchmarks. We introduce
key features such as procedural generation, to enable the creation of countless
task variations and support open-ended learning. Mini-BEHAVIOR provides
implementations of various household tasks from the original BEHAVIOR
benchmark, along with starter code for data collection and reinforcement
learning agent training. In essence, Mini-BEHAVIOR offers a fast, open-ended
benchmark for evaluating decision-making and planning solutions in embodied AI.
It serves as a user-friendly entry point for research and facilitates the
evaluation and development of solutions, simplifying their assessment and
development while advancing the field of embodied AI. Code is publicly
available at https://github.com/StanfordVL/mini_behavior.
( 2
min )
In this paper, we propose the use of self-supervised pretraining on a large
unlabelled data set to improve the performance of a personalized voice activity
detection (VAD) model in adverse conditions. We pretrain a long short-term
memory (LSTM)-encoder using the autoregressive predictive coding (APC)
framework and fine-tune it for personalized VAD. We also propose a denoising
variant of APC, with the goal of improving the robustness of personalized VAD.
The trained models are systematically evaluated on both clean speech and speech
contaminated by various types of noise at different SNR-levels and compared to
a purely supervised model. Our experiments show that self-supervised
pretraining not only improves performance in clean conditions, but also yields
models which are more robust to adverse conditions compared to purely
supervised learning.
( 2
min )
We present a comprehensive solution to learn and improve text-to-image models
from human preference feedback. To begin with, we build ImageReward -- the
first general-purpose text-to-image human preference reward model -- to
effectively encode human preferences. Its training is based on our systematic
annotation pipeline including rating and ranking, which collects 137k expert
comparisons to date. In human evaluation, ImageReward outperforms existing
scoring models and metrics, making it a promising automatic metric for
evaluating text-to-image synthesis. On top of it, we propose Reward Feedback
Learning (ReFL), a direct tuning algorithm to optimize diffusion models against
a scorer. Both automatic and human evaluation support ReFL's advantages over
compared methods. All code and datasets are provided at
\url{https://github.com/THUDM/ImageReward}.
( 2
min )
Self-supervised learning (SSL) in audio holds significant potential across
various domains, particularly in situations where abundant, unlabeled data is
readily available at no cost. This is particularly pertinent in bioacoustics,
where biologists routinely collect extensive sound datasets from the natural
environment. In this study, we demonstrate that SSL is capable of acquiring
meaningful representations of bird sounds from audio recordings without the
need for annotations. Our experiments showcase that these learned
representations exhibit the capacity to generalize to new bird species in
few-shot learning (FSL) scenarios. Additionally, we show that selecting windows
with high bird activation for self-supervised learning, using a pretrained
audio neural network, significantly enhances the quality of the learned
representations.
( 2
min )
Foundation models, specifically Large Language Models (LLM's), have lately
gained wide-spread attention and adoption. Reinforcement Learning with Human
Feedback (RLHF) involves training a reward model to capture desired behaviors,
which is then used to align LLM's. These reward models are additionally used at
inference-time to estimate LLM responses' adherence to those desired behaviors.
However, there is little work measuring how robust these reward models are to
distribution shifts. In this work, we evaluate how reward model performance -
measured via accuracy and calibration (i.e. alignment between accuracy and
confidence) - is affected by distribution shift. We show novel calibration
patterns and accuracy drops due to OOD prompts and responses, and that the
reward model is more sensitive to shifts in responses than prompts.
Additionally, we adapt an OOD detection technique commonly used in
classification to the reward model setting to detect these distribution shifts
in prompts and responses.
( 2
min )
We explore the possibility of fully replacing a plasma physics kinetic
simulator with a graph neural network-based simulator. We focus on this class
of surrogate models given the similarity between their message-passing update
mechanism and the traditional physics solver update, and the possibility of
enforcing known physical priors into the graph construction and update. We show
that our model learns the kinetic plasma dynamics of the one-dimensional plasma
model, a predecessor of contemporary kinetic plasma simulation codes, and
recovers a wide range of well-known kinetic plasma processes, including plasma
thermalization, electrostatic fluctuations about thermal equilibrium, and the
drag on a fast sheet and Landau damping. We compare the performance against the
original plasma model in terms of run-time, conservation laws, and temporal
evolution of key physical quantities. The limitations of the model are
presented and possible directions for higher-dimensional surrogate models for
kinetic plasmas are discussed.
( 2
min )
Three-dimensional native states of natural proteins display recurring and
hierarchical patterns. Yet, traditional graph-based modeling of protein
structures is often limited to operate within a single fine-grained resolution,
and lacks hourglass neural architectures to learn those high-level building
blocks. We narrow this gap by introducing Ophiuchus, an SO(3)-equivariant
coarse-graining model that efficiently operates on all-atom protein structures.
Our model departs from current approaches that employ graph modeling, instead
focusing on local convolutional coarsening to model sequence-motif interactions
with efficient time complexity in protein length. We measure the reconstruction
capabilities of Ophiuchus across different compression rates, and compare it to
existing models. We examine the learned latent space and demonstrate its
utility through conformational interpolation. Finally, we leverage denoising
diffusion probabilistic models (DDPM) in the latent space to efficiently sample
protein structures. Our experiments demonstrate Ophiuchus to be a scalable
basis for efficient protein modeling and generation.
( 2
min )
We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system
that allows control over speaker identity using natural language descriptions.
To control speaker identity within the prompt-based TTS framework, we introduce
the concept of speaker prompt, which describes voice characteristics (e.g.,
gender-neutral, young, old, and muffled) designed to be approximately
independent of speaking style. Since there is no large-scale dataset containing
speaker prompts, we first construct a dataset based on the LibriTTS-R corpus
with manually annotated speaker prompts. We then employ a diffusion-based
acoustic model with mixture density networks to model diverse speaker factors
in the training data. Unlike previous studies that rely on style prompts
describing only a limited aspect of speaker individuality, such as pitch,
speaking speed, and energy, our method utilizes an additional speaker prompt to
effectively learn the mapping from natural language descriptions to the
acoustic features of diverse speakers. Our subjective evaluation results show
that the proposed method can better control speaker characteristics than the
methods without the speaker prompt. Audio samples are available at
https://reppy4620.github.io/demo.promptttspp/.
( 2
min )
Model-based sequential approaches to discrete "black-box" optimization,
including Bayesian optimization techniques, often access the same points
multiple times for a given objective function in interest, resulting in many
steps to find the global optimum. Here, we numerically study the effect of a
postprocessing method on Bayesian optimization that strictly prohibits
duplicated samples in the dataset. We find the postprocessing method
significantly reduces the number of sequential steps to find the global
optimum, especially when the acquisition function is of maximum a posterior
estimation. Our results provide a simple but general strategy to solve the slow
convergence of Bayesian optimization for high-dimensional problems.
( 2
min )
We propose to enhance the training of physics-informed neural networks
(PINNs). To this aim, we introduce nonlinear additive and multiplicative
preconditioning strategies for the widely used L-BFGS optimizer. The nonlinear
preconditioners are constructed by utilizing the Schwarz domain-decomposition
framework, where the parameters of the network are decomposed in a layer-wise
manner. Through a series of numerical experiments, we demonstrate that both,
additive and multiplicative preconditioners significantly improve the
convergence of the standard L-BFGS optimizer, while providing more accurate
solutions of the underlying partial differential equations. Moreover, the
additive preconditioner is inherently parallel, thus giving rise to a novel
approach to model parallelism.
( 2
min )
The training process of ReLU neural networks often exhibits complicated
nonlinear phenomena. The nonlinearity of models and non-convexity of loss pose
significant challenges for theoretical analysis. Therefore, most previous
theoretical works on the optimization dynamics of neural networks focus either
on local analysis (like the end of training) or approximate linear models (like
Neural Tangent Kernel). In this work, we conduct a complete theoretical
characterization of the training process of a two-layer ReLU network trained by
Gradient Flow on a linearly separable data. In this specific setting, our
analysis captures the whole optimization process starting from random
initialization to final convergence. Despite the relatively simple model and
data that we studied, we reveal four different phases from the whole training
process showing a general simplifying-to-complicating learning trend. Specific
nonlinear behaviors can also be precisely identified and captured
theoretically, such as initial condensation, saddle-to-plateau dynamics,
plateau escape, changes of activation patterns, learning with increasing
complexity, etc.
( 2
min )
We propose a new method to estimate a root-directed spanning tree from
extreme data. A prominent example is a river network, to be discovered from
extreme flow measured at a set of stations. Our new algorithm utilizes
qualitative aspects of a max-linear Bayesian network, which has been designed
for modelling causality in extremes. The algorithm estimates bivariate scores
and returns a root-directed spanning tree. It performs extremely well on
benchmark data and new data. We prove that the new estimator is consistent
under a max-linear Bayesian network model with noise. We also assess its
strengths and limitations in a small simulation study.
( 2
min )
We present a short tutorial on to the use of the R gasper package. Gasper is
a package dedicated to signal processing on graphs. It also provides an
interface to the SuiteSparse Matrix Collection.
( 2
min )
This paper studies experimental designs for estimation and inference on
policies with spillover effects. Units are organized into a finite number of
large clusters and interact in unknown ways within each cluster. First, we
introduce a single-wave experiment that, by varying the randomization across
cluster pairs, estimates the marginal effect of a change in treatment
probabilities, taking spillover effects into account. Using the marginal
effect, we propose a test for policy optimality. Second, we design a
multiple-wave experiment to estimate welfare-maximizing treatment rules. We
provide strong theoretical guarantees and an implementation in a large-scale
field experiment.
( 2
min )
The use of transfer learning with deep neural networks has increasingly
become widespread for deploying well-tested computer vision systems to newer
domains, especially those with limited datasets. We describe a transfer
learning use case for a domain with a data-starved regime, having fewer than
100 labeled target samples. We evaluate the effectiveness of convolutional
feature extraction and fine-tuning of overparameterized models with respect to
the size of target training data, as well as their generalization performance
on data with covariate shift, or out-of-distribution (OOD) data. Our
experiments demonstrate that both overparameterization and feature reuse
contribute to the successful application of transfer learning in training image
classifiers in data-starved regimes. We provide visual explanations to support
our findings and conclude that transfer learning enhances the performance of
CNN architectures in data-starved regimes.
( 2
min )
These lecture notes give a statistical perspective on the foundations of
reinforcement learning and interactive decision making. We present a unifying
framework for addressing the exploration-exploitation dilemma using frequentist
and Bayesian approaches, with connections and parallels between supervised
learning/estimation and decision making as an overarching theme. Special
attention is paid to function approximation and flexible model classes such as
neural networks. Topics covered include multi-armed and contextual bandits,
structured bandits, and reinforcement learning with high-dimensional feedback.
( 2
min )
This paper introduces novel alternate training procedures for hard-parameter
sharing Multi-Task Neural Networks (MTNNs). Traditional MTNN training faces
challenges in managing conflicting loss gradients, often yielding sub-optimal
performance. The proposed alternate training method updates shared and
task-specific weights alternately, exploiting the multi-head architecture of
the model. This approach reduces computational costs, enhances training
regularization, and improves generalization. Convergence properties similar to
those of the classical stochastic gradient method are established. Empirical
experiments demonstrate delayed overfitting, improved prediction, and reduced
computational demands. In summary, our alternate training procedures offer a
promising advancement for the training of hard-parameter sharing MTNNs.
( 2
min )
We study the problem of learning linear temporal logic (LTL) formulas from
examples, as a first step towards expressing a property separating positive and
negative instances in a way that is comprehensible for humans. In this paper we
initiate the study of the computational complexity of the problem. Our main
results are hardness results: we show that the LTL learning problem is
NP-complete, both for the full logic and for almost all of its fragments. This
motivates the search for efficient heuristics, and highlights the complexity of
expressing separating properties in concise natural language.
( 2
min )
Generalized Labeled Multi-Bernoulli (GLMB) densities arise in a host of
multi-object system applications analogous to Gaussians in single-object
filtering. However, computing the GLMB filtering density requires solving
NP-hard problems. To alleviate this computational bottleneck, we develop a
linear complexity Gibbs sampling framework for GLMB density computation.
Specifically, we propose a tempered Gibbs sampler that exploits the structure
of the GLMB filtering density to achieve an $\mathcal{O}(T(P+M))$ complexity,
where $T$ is the number of iterations of the algorithm, $P$ and $M$ are the
number hypothesized objects and measurements. This innovation enables the GLMB
filter implementation to be reduced from an $\mathcal{O}(TP^{2}M)$ complexity
to $\mathcal{O}(T(P+M+\log T)+PM)$. Moreover, the proposed framework provides
the flexibility for trade-offs between tracking performance and computational
load. Convergence of the proposed Gibbs sampler is established, and numerical
studies are presented to validate the proposed GLMB filter implementation.
( 2
min )
We introduce a new empirical Bayes approach for large-scale multiple linear
regression. Our approach combines two key ideas: (i) the use of flexible
"adaptive shrinkage" priors, which approximate the nonparametric family of
scale mixture of normal distributions by a finite mixture of normal
distributions; and (ii) the use of variational approximations to efficiently
estimate prior hyperparameters and compute approximate posteriors. Combining
these two ideas results in fast and flexible methods, with computational speed
comparable to fast penalized regression methods such as the Lasso, and with
superior prediction accuracy across a wide range of scenarios. Furthermore, we
show that the posterior mean from our method can be interpreted as solving a
penalized regression problem, with the precise form of the penalty function
being learned from the data by directly solving an optimization problem (rather
than being tuned by cross-validation). Our methods are implemented in an R
package, mr.ash.alpha, available from
https://github.com/stephenslab/mr.ash.alpha
( 2
min )
The rising popularity of artificial intelligence in healthcare is
highlighting the problem that a computational model achieving super-human
clinical performance at its training sites may perform substantially worse at
new sites. In this perspective, we present common sources for this failure to
transport, which we divide into sources under the control of the experimenter
and sources inherent to the clinical data-generating process. Of the inherent
sources we look a little deeper into site-specific clinical practices that can
affect the data distribution, and propose a potential solution intended to
isolate the imprint of those practices on the data from the patterns of disease
cause and effect that are the usual target of probabilistic clinical models.
( 2
min )
Model-based sequential approaches to discrete "black-box" optimization,
including Bayesian optimization techniques, often access the same points
multiple times for a given objective function in interest, resulting in many
steps to find the global optimum. Here, we numerically study the effect of a
postprocessing method on Bayesian optimization that strictly prohibits
duplicated samples in the dataset. We find the postprocessing method
significantly reduces the number of sequential steps to find the global
optimum, especially when the acquisition function is of maximum a posterior
estimation. Our results provide a simple but general strategy to solve the slow
convergence of Bayesian optimization for high-dimensional problems.
( 2
min )
We propose a new method to estimate a root-directed spanning tree from
extreme data. A prominent example is a river network, to be discovered from
extreme flow measured at a set of stations. Our new algorithm utilizes
qualitative aspects of a max-linear Bayesian network, which has been designed
for modelling causality in extremes. The algorithm estimates bivariate scores
and returns a root-directed spanning tree. It performs extremely well on
benchmark data and new data. We prove that the new estimator is consistent
under a max-linear Bayesian network model with noise. We also assess its
strengths and limitations in a small simulation study.
( 2
min )
We consider the optimization problem associated with fitting two-layer ReLU
networks with respect to the squared loss, where labels are assumed to be
generated by a target network. Focusing first on standard Gaussian inputs, we
show that the structure of spurious local minima detected by stochastic
gradient descent (SGD) is, in a well-defined sense, the \emph{least loss of
symmetry} with respect to the target weights. A closer look at the analysis
indicates that this principle of least symmetry breaking may apply to a broader
range of settings. Motivated by this, we conduct a series of experiments which
corroborate this hypothesis for different classes of non-isotropic non-product
distributions, smooth activation functions and networks with a few layers.
( 2
min )
The use of transfer learning with deep neural networks has increasingly
become widespread for deploying well-tested computer vision systems to newer
domains, especially those with limited datasets. We describe a transfer
learning use case for a domain with a data-starved regime, having fewer than
100 labeled target samples. We evaluate the effectiveness of convolutional
feature extraction and fine-tuning of overparameterized models with respect to
the size of target training data, as well as their generalization performance
on data with covariate shift, or out-of-distribution (OOD) data. Our
experiments demonstrate that both overparameterization and feature reuse
contribute to the successful application of transfer learning in training image
classifiers in data-starved regimes. We provide visual explanations to support
our findings and conclude that transfer learning enhances the performance of
CNN architectures in data-starved regimes.
( 2
min )
One of the most recent and fascinating breakthroughs in artificial
intelligence is ChatGPT, a chatbot which can simulate human conversation.
ChatGPT is an instance of GPT4, which is a language model based on generative
gredictive gransformers. So if one wants to study from a theoretical point of
view, how powerful such artificial intelligence can be, one approach is to
consider transformer networks and to study which problems one can solve with
these networks theoretically. Here it is not only important what kind of models
these network can approximate, or how they can generalize their knowledge
learned by choosing the best possible approximation to a concrete data set, but
also how well optimization of such transformer network based on concrete data
set works. In this article we consider all these three different aspects
simultaneously and show a theoretical upper bound on the missclassification
probability of a transformer network fitted to the observed data. For
simplicity we focus in this context on transformer encoder networks which can
be applied to define an estimate in the context of a classification problem
involving natural language.
( 2
min )
We propose a new method called the N-particle underdamped Langevin algorithm
for optimizing a special class of non-linear functionals defined over the space
of probability measures. Examples of problems with this formulation include
training neural networks in the mean-field regime, density estimation, and
kernel Stein discrepancy minimization. Our algorithm is based on a novel
space-time discretization of the mean-field underdamped Langevin dynamics, for
which we provide a new, fast mixing guarantee. In addition, we demonstrate that
our algorithm converges globally in total variation distance, bridging the
theoretical gap between the dynamics and its practical implementation.
( 2
min )
These lecture notes give a statistical perspective on the foundations of
reinforcement learning and interactive decision making. We present a unifying
framework for addressing the exploration-exploitation dilemma using frequentist
and Bayesian approaches, with connections and parallels between supervised
learning/estimation and decision making as an overarching theme. Special
attention is paid to function approximation and flexible model classes such as
neural networks. Topics covered include multi-armed and contextual bandits,
structured bandits, and reinforcement learning with high-dimensional feedback.
( 2
min )
Whether abundant, endangered or extinct, animal species are the focus of countless AI-powered conservation projects. These initiatives — accelerated using NVIDIA GPUs, deep learning software and robotics technology — are alerting conservationists to poaching threats, powering more sustainable aquaculture and helping scientists monitor coral reef health. Take a safari through the NVIDIA Blog’s top animal Read article >
( 7
min )
Before ringing in the new year, GeForce NOW is taking a look back at a 2023 full of top-notch gaming. Explore GeForce NOW’s year in review, which brought more hit games, improved service features and the launch of the Ultimate membership tier. Plus, GFN Thursday is raising a toast to the GeForce NOW community by Read article >
( 7
min )
In this paper, we study asynchronous stochastic approximation algorithms
without communication delays. Our main contribution is a stability proof for
these algorithms that extends a method of Borkar and Meyn by accommodating more
general noise conditions. We also derive convergence results from this
stability result and discuss their application in important average-reward
reinforcement learning problems.
( 2
min )
Out-of-distribution (OOD) detection is an important topic for real-world
machine learning systems, but settings with limited in-distribution samples
have been underexplored. Such few-shot OOD settings are challenging, as models
have scarce opportunities to learn the data distribution before being tasked
with identifying OOD samples. Indeed, we demonstrate that recent
state-of-the-art OOD methods fail to outperform simple baselines in the
few-shot setting. We thus propose a hypernetwork framework called HyperMix,
using Mixup on the generated classifier parameters, as well as a natural
out-of-episode outlier exposure technique that does not require an additional
outlier dataset. We conduct experiments on CIFAR-FS and MiniImageNet,
significantly outperforming other OOD methods in the few-shot regime.
( 2
min )
Recent advancements in sensing and communication facilitate obtaining
high-frequency real-time data from various physical systems like power
networks, climate systems, biological networks, etc. However, since the data
are recorded by physical sensors, it is natural that the obtained data is
corrupted by measurement noise. In this paper, we present a novel algorithm for
online real-time learning of dynamical systems from noisy time-series data,
which employs the Robust Koopman operator framework to mitigate the effect of
measurement noise. The proposed algorithm has three main advantages: a) it
allows for online real-time monitoring of a dynamical system; b) it obtains a
linear representation of the underlying dynamical system, thus enabling the
user to use linear systems theory for analysis and control of the system; c) it
is computationally fast and less intensive than the popular Extended Dynamic
Mode Decomposition (EDMD) algorithm. We illustrate the efficiency of the
proposed algorithm by applying it to identify the Van der Pol oscillator, the
IEEE 68 bus system, and a ring network of Van der Pol oscillators.
( 2
min )
Many companies rely on APIs of managed AI models such as OpenAI's GPT-4 to
create AI-enabled experiences in their products. Along with the benefits of
ease of use and shortened time to production, this reliance on proprietary APIs
has downsides in terms of model control, performance reliability, up-time
predictability, and cost. At the same time, there has been a flurry of open
source small language models (SLMs) that have been made available for
commercial use. However, their readiness to replace existing capabilities
remains unclear, and a systematic approach to test these models is not readily
available. In this paper, we present a systematic evaluation methodology for,
and characterization of, modern open source SLMs and their trade-offs when
replacing a proprietary LLM APIs for a real-world product feature. We have
designed SLaM, an automated analysis tool that enables the quantitative and
qualitative testing of product features utilizing arbitrary SLMs. Using SLaM,
we examine both the quality and the performance characteristics of modern SLMs
relative to an existing customer-facing OpenAI-based implementation. We find
that across 9 SLMs and 29 variants, we observe competitive quality-of-results
for our use case, significant performance consistency improvement, and a cost
reduction of 5x-29x when compared to OpenAI GPT-4.
( 3
min )
In stochastic zeroth-order optimization, a problem of practical relevance is
understanding how to fully exploit the local geometry of the underlying
objective function. We consider a fundamental setting in which the objective
function is quadratic, and provide the first tight characterization of the
optimal Hessian-dependent sample complexity. Our contribution is twofold.
First, from an information-theoretic point of view, we prove tight lower bounds
on Hessian-dependent complexities by introducing a concept called energy
allocation, which captures the interaction between the searching algorithm and
the geometry of objective functions. A matching upper bound is obtained by
solving the optimal energy spectrum. Then, algorithmically, we show the
existence of a Hessian-independent algorithm that universally achieves the
asymptotic optimal sample complexities for all Hessian instances. The optimal
sample complexities achieved by our algorithm remain valid for heavy-tailed
noise distributions, which are enabled by a truncation method.
( 2
min )
This paper explores the image synthesis capabilities of GPT-4, a leading
multi-modal large language model. We establish a benchmark for evaluating the
fidelity of texture features in images generated by GPT-4, comprising manually
painted pictures and their AI-generated counterparts. The contributions of this
study are threefold: First, we provide an in-depth analysis of the fidelity of
image synthesis features based on GPT-4, marking the first such study on this
state-of-the-art model. Second, the quantitative and qualitative experiments
fully reveals the limitations of the GPT-4 model in image synthesis. Third, we
have compiled a unique benchmark of manual drawings and corresponding
GPT-4-generated images, introducing a new task to advance fidelity research in
AI-generated content (AIGC). The dataset is available at:
\url{https://github.com/rickwang28574/DeepArt}.
( 2
min )
This paper presents a Gaussian Process (GP) framework, a non-parametric
technique widely acknowledged for regression and classification tasks, to
address inverse problems in mean field games (MFGs). By leveraging GPs, we aim
to recover agents' strategic actions and the environment's configurations from
partial and noisy observations of the population of agents and the setup of the
environment. Our method is a probabilistic tool to infer the behaviors of
agents in MFGs from data in scenarios where the comprehensive dataset is either
inaccessible or contaminated by noises.
( 2
min )
We propose a simple multivariate normality test based on Kac-Bernstein's
characterization, which can be conducted by utilising existing statistical
independence tests for sums and differences of data samples. We also perform
its empirical investigation, which reveals that for high-dimensional data, the
proposed approach may be more efficient than the alternative ones. The
accompanying code repository is provided at \url{https://shorturl.at/rtuy5}.
( 2
min )
We explore the applications of random matrix theory (RMT) in the training of
deep neural networks (DNNs), focusing on layer pruning that is reducing the
number of DNN parameters (weights). Our numerical results show that this
pruning leads to a drastic reduction of parameters while not reducing the
accuracy of DNNs and CNNs. Moreover, pruning the fully connected DNNs actually
increases the accuracy and decreases the variance for random initializations.
Our numerics indicate that this enhancement in accuracy is due to the
simplification of the loss landscape. We next provide rigorous mathematical
underpinning of these numerical results by proving the RMT-based Pruning
Theorem. Our results offer valuable insights into the practical application of
RMT for the creation of more efficient and accurate deep-learning models.
( 2
min )
This paper proposes an efficient optimizer called AdaPlus which integrates
Nesterov momentum and precise stepsize adjustment on AdamW basis. AdaPlus
combines the advantages of AdamW, Nadam, and AdaBelief and, in particular, does
not introduce any extra hyper-parameters. We perform extensive experimental
evaluations on three machine learning tasks to validate the effectiveness of
AdaPlus. The experiment results validate that AdaPlus (i) among all the
evaluated adaptive methods, performs most comparable with (even slightly better
than) SGD with momentum on image classification tasks and (ii) outperforms
other state-of-the-art optimizers on language modeling tasks and illustrates
pretty high stability when training GANs. The experiment code of AdaPlus will
be accessible at: https://github.com/guanleics/AdaPlus.
( 2
min )
The growth of network-connected devices has led to an exponential increase in
data generation, creating significant challenges for efficient data analysis.
This data is generated continuously, creating a dynamic flow known as a data
stream. The characteristics of a data stream may change dynamically, and this
change is known as concept drift. Consequently, a method for handling data
streams must efficiently reduce their volume while dynamically adapting to
these changing characteristics. This paper proposes a simple online vector
quantization method for concept drift. The proposed method identifies and
replaces units with low win probability through remove-birth updating, thus
achieving a rapid adaptation to concept drift. Furthermore, the results of this
study show that the proposed method can generate minimal dead units even in the
presence of concept drift. This study also suggests that some metrics
calculated from the proposed method will be helpful for drift detection.
( 2
min )
Multi-query attention (MQA), which only uses a single key-value head,
drastically speeds up decoder inference. However, MQA can lead to quality
degradation, and moreover it may not be desirable to train a separate model
just for faster inference. We (1) propose a recipe for uptraining existing
multi-head language model checkpoints into models with MQA using 5% of original
pre-training compute, and (2) introduce grouped-query attention (GQA), a
generalization of multi-query attention which uses an intermediate (more than
one, less than number of query heads) number of key-value heads. We show that
uptrained GQA achieves quality close to multi-head attention with comparable
speed to MQA.
( 2
min )
The Implicitly Normalized Forecaster (INF) algorithm is considered to be an
optimal solution for adversarial multi-armed bandit (MAB) problems. However,
most of the existing complexity results for INF rely on restrictive
assumptions, such as bounded rewards. Recently, a related algorithm was
proposed that works for both adversarial and stochastic heavy-tailed MAB
settings. However, this algorithm fails to fully exploit the available data.
In this paper, we propose a new version of INF called the Implicitly
Normalized Forecaster with clipping (INF-clip) for MAB problems with
heavy-tailed reward distributions. We establish convergence results under mild
assumptions on the rewards distribution and demonstrate that INF-clip is
optimal for linear heavy-tailed stochastic MAB problems and works well for
non-linear ones. Furthermore, we show that INF-clip outperforms the
best-of-both-worlds algorithm in cases where it is difficult to distinguish
between different arms.
( 2
min )
We study the consistency of surrogate risks for robust binary classification.
It is common to learn robust classifiers by adversarial training, which seeks
to minimize the expected $0$-$1$ loss when each example can be maliciously
corrupted within a small ball. We give a simple and complete characterization
of the set of surrogate loss functions that are \emph{consistent}, i.e., that
can replace the $0$-$1$ loss without affecting the minimizing sequences of the
original adversarial risk, for any data distribution. We also prove a
quantitative version of adversarial consistency for the $\rho$-margin loss. Our
results reveal that the class of adversarially consistent surrogates is
substantially smaller than in the standard setting, where many common
surrogates are known to be consistent.
( 2
min )
The current trend in developing machine learning models for reading
comprehension and logical reasoning tasks is focused on improving the models'
abilities to understand and utilize logical rules. This work focuses on
providing a novel loss function and accompanying model architecture that has
more interpretable components than some other models by representing a common
strategy employed by humans when given reading comprehension and logical
reasoning tasks. Our strategy involves emphasizing relative accuracy over
absolute accuracy and can theoretically produce the correct answer with
incomplete knowledge. We examine the effectiveness of this strategy to solve
reading comprehension and logical reasoning questions. The models were
evaluated on the ReClor dataset, a challenging reading comprehension and
logical reasoning benchmark. We propose the polytuplet loss function, which
forces prioritization of learning the relative correctness of answer choices
over learning the true accuracy of each choice. Our results indicate that
models employing polytuplet loss outperform existing baseline models, though
further research is required to quantify the benefits it may present.
( 2
min )
We introduce a new approach for generating sequences of implied volatility
(IV) surfaces across multiple assets that is faithful to historical prices. We
do so using a combination of functional data analysis and neural stochastic
differential equations (SDEs) combined with a probability integral transform
penalty to reduce model misspecification. We demonstrate that learning the
joint dynamics of IV surfaces and prices produces market scenarios that are
consistent with historical features and lie within the sub-manifold of surfaces
that are essentially free of static arbitrage. Finally, we demonstrate that
delta hedging using the simulated surfaces generates profit and loss (P&L)
distributions that are consistent with realised P&Ls.
( 2
min )
Arunachalam and de Wolf (2018) showed that the sample complexity of quantum
batch learning of boolean functions, in the realizable and agnostic settings,
has the same form and order as the corresponding classical sample complexities.
In this paper, we extend this, ostensibly surprising, message to batch
multiclass learning, online boolean learning, and online multiclass learning.
For our online learning results, we first consider an adaptive adversary
variant of the classical model of Dawid and Tewari (2022). Then, we introduce
the first (to the best of our knowledge) model of online learning with quantum
examples.
( 2
min )
In nonstationary bandit learning problems, the decision-maker must
continually gather information and adapt their action selection as the latent
state of the environment evolves. In each time period, some latent optimal
action maximizes expected reward under the environment state. We view the
optimal action sequence as a stochastic process, and take an
information-theoretic approach to analyze attainable performance. We bound
limiting per-period regret in terms of the entropy rate of the optimal action
process. The bound applies to a wide array of problems studied in the
literature and reflects the problem's information structure through its
information-ratio.
( 2
min )
Federated Learning (FL) and Split Learning (SL) are two popular paradigms of
distributed machine learning. By offloading the computation-intensive portions
to the server, SL is promising for deep model training on resource-constrained
devices, yet still lacking of rigorous convergence analysis. In this paper, we
derive the convergence guarantees of Sequential SL (SSL, the vanilla case of SL
that conducts the model training in sequence) for strongly/general/non-convex
objectives on heterogeneous data. Notably, the derived guarantees suggest that
SSL is better than Federated Averaging (FedAvg, the most popular algorithm in
FL) on heterogeneous data. We validate the counterintuitive analysis result
empirically on extremely heterogeneous data.
( 2
min )
We study stochastic delayed feedback in general multi-agent sequential
decision making, which includes bandits, single-agent Markov decision processes
(MDPs), and Markov games (MGs). We propose a novel reduction-based framework,
which turns any multi-batched algorithm for sequential decision making with
instantaneous feedback into a sample-efficient algorithm that can handle
stochastic delays in sequential decision making. By plugging different
multi-batched algorithms into our framework, we provide several examples
demonstrating that our framework not only matches or improves existing results
for bandits, tabular MDPs, and tabular MGs, but also provides the first line of
studies on delays in sequential decision making with function approximation. In
summary, we provide a complete set of sharp results for multi-agent sequential
decision making with delayed feedback.
( 2
min )
Understanding the loss of information in spectral analytics is a crucial
first step towards finding root causes for failures and uncertainties using
spectral data in artificial intelligence models built from modern complex data
science applications. Here, we show from an elementary Shannon entropy model
analysis with quantum statistics of Gaussian distributed spectral data, that
the relative loss of information from dimensionality reduction due to the
projection of an initial five-dimensional dataset onto two-dimensional diagrams
is less than one percent in the parameter range of small data sets with sample
sizes on the order of few hundred data samples. From our analysis, we also
conclude that the density and expectation value of the entropy probability
distribution increases with the sample number and sample size using artificial
data models derived from random sampling Monte Carlo simulation methods.
( 2
min )
We present a convolutional framework which significantly reduces the
complexity and thus, the computational effort for distributed reinforcement
learning control of dynamical systems governed by partial differential
equations (PDEs). Exploiting translational invariances, the high-dimensional
distributed control problem can be transformed into a multi-agent control
problem with many identical, uncoupled agents. Furthermore, using the fact that
information is transported with finite velocity in many cases, the dimension of
the agents' environment can be drastically reduced using a convolution
operation over the state space of the PDE. In this setting, the complexity can
be flexibly adjusted via the kernel width or by using a stride greater than
one. Moreover, scaling from smaller to larger systems -- or the transfer
between different domains -- becomes a straightforward task requiring little
effort. We demonstrate the performance of the proposed framework using several
PDE examples with increasing complexity, where stabilization is achieved by
training a low-dimensional deep deterministic policy gradient agent using
minimal computing resources.
( 2
min )
Effective representation of molecules is a crucial factor affecting the
performance of artificial intelligence models. This study introduces a
flexible, fragment-based, multiscale molecular representation framework called
t-SMILES (tree-based SMILES) with three code algorithms: TSSA (t-SMILES with
Shared Atom), TSDY (t-SMILES with Dummy Atom) and TSID (t-SMILES with ID). It
describes molecules using SMILES-type strings obtained by performing a
breadth-first search on a full binary tree formed from a fragmented molecular
graph. Systematic evaluations using JTVAE, BRICS, MMPA, and Scaffold show the
feasibility to construct a multilingual molecular description system, where
various descriptions complement each other, enhancing the overall performance.
Additionally, it exhibits impressive performance on low-resource datasets,
whether the model is original, data augmented, or pre-training fine-tuned. It
significantly outperforms classical SMILES, DeepSMILES, SELFIES and baseline
models in goal-directed tasks. Furthermore, it surpasses start-of-the-art
fragment, graph and SMILES based approaches on ChEMBL, Zinc, and QM9.
( 2
min )
We consider (nonparametric) sparse additive models (SpAM) for classification.
The design of a SpAM classifier is based on minimizing the logistic loss with a
sparse group Lasso/Slope-type penalties on the coefficients of univariate
additive components' expansions in orthonormal series (e.g., Fourier or
wavelets). The resulting classifier is inherently adaptive to the unknown
sparsity and smoothness. We show that under certain sparse group restricted
eigenvalue condition it is nearly-minimax (up to log-factors) simultaneously
across the entire range of analytic, Sobolev and Besov classes. The performance
of the proposed classifier is illustrated on a simulated and a real-data
examples.
( 2
min )
Recently proposed BERT-based evaluation metrics for text generation perform
well on standard benchmarks but are vulnerable to adversarial attacks, e.g.,
relating to information correctness. We argue that this stems (in part) from
the fact that they are models of semantic similarity. In contrast, we develop
evaluation metrics based on Natural Language Inference (NLI), which we deem a
more appropriate modeling. We design a preference-based adversarial attack
framework and show that our NLI based metrics are much more robust to the
attacks than the recent BERT-based metrics. On standard benchmarks, our NLI
based metrics outperform existing summarization metrics, but perform below SOTA
MT metrics. However, when combining existing metrics with our NLI metrics, we
obtain both higher adversarial robustness (15%-30%) and higher quality metrics
as measured on standard benchmarks (+5% to 30%).
( 2
min )
The accuracy of tinyML applications is often affected by various
environmental factors, such as noises, location/calibration of sensors, and
time-related changes. This article introduces a neural network based on-device
learning (ODL) approach to address this issue by retraining in deployed
environments. Our approach relies on semi-supervised sequential training of
multiple neural networks tailored for low-end edge devices. This article
introduces its algorithm and implementation on wireless sensor nodes consisting
of a Raspberry Pi Pico and low-power wireless module. Experiments using
vibration patterns of rotating machines demonstrate that retraining by ODL
improves anomaly detection accuracy compared with a prediction-only deep neural
network in a noisy environment. The results also show that the ODL approach can
save communication cost and energy consumption for battery-powered Internet of
Things devices.
( 2
min )
Most fair machine learning methods either highly rely on the sensitive
information of the training samples or require a large modification on the
target models, which hinders their practical application. To address this
issue, we propose a two-stage training algorithm named FAIRIF. It minimizes the
loss over the reweighted data set (second stage) where the sample weights are
computed to balance the model performance across different demographic groups
(first stage). FAIRIF can be applied on a wide range of models trained by
stochastic gradient descent without changing the model, while only requiring
group annotations on a small validation set to compute sample weights.
Theoretically, we show that, in the classification setting, three notions of
disparity among different groups can be mitigated by training with the weights.
Experiments on synthetic data sets demonstrate that FAIRIF yields models with
better fairness-utility trade-offs against various types of bias; and on
real-world data sets, we show the effectiveness and scalability of FAIRIF.
Moreover, as evidenced by the experiments with pretrained models, FAIRIF is
able to alleviate the unfairness issue of pretrained models without hurting
their performance.
( 3
min )
Molecular design based on generative models, such as variational autoencoders
(VAEs), has become increasingly popular in recent years due to its efficiency
for exploring high-dimensional molecular space to identify molecules with
desired properties. While the efficacy of the initial model strongly depends on
the training data, the sampling efficiency of the model for suggesting novel
molecules with enhanced properties can be further enhanced via latent space
optimization. In this paper, we propose a multi-objective latent space
optimization (LSO) method that can significantly enhance the performance of
generative molecular design (GMD). The proposed method adopts an iterative
weighted retraining approach, where the respective weights of the molecules in
the training data are determined by their Pareto efficiency. We demonstrate
that our multi-objective GMD LSO method can significantly improve the
performance of GMD for jointly optimizing multiple molecular properties.
( 2
min )
Intent Detection is one of the core tasks of dialog systems. Few-shot Intent
Detection is challenging due to limited number of annotated utterances for
novel classes. Generalized Few-shot intent detection is more realistic but
challenging setup which aims to discriminate the joint label space of both
novel intents which have few examples each and existing intents consisting of
enough labeled data. Large label spaces and fewer number of shots increase the
complexity of the task. In this work, we employ a simple and effective method
based on Natural Language Inference that leverages the semantics in the
class-label names to learn and predict the novel classes. Our method achieves
state-of-the-art results on 1-shot and 5-shot intent detection task with gains
ranging from 2-8\% points in F1 score on four benchmark datasets. Our method
also outperforms existing approaches on a more practical setting of generalized
few-shot intent detection with gains up to 20% F1 score. We show that the
suggested approach performs well across single and multi domain datasets with
the number of class labels from as few as 7 to as high as 150.
( 2
min )
The modeling and control of complex physical systems are essential in
real-world problems. We propose a novel framework that is generally applicable
to solving PDE-constrained optimal control problems by introducing surrogate
models for PDE solution operators with special regularizers. The procedure of
the proposed framework is divided into two phases: solution operator learning
for PDE constraints (Phase 1) and searching for optimal control (Phase 2). Once
the surrogate model is trained in Phase 1, the optimal control can be inferred
in Phase 2 without intensive computations. Our framework can be applied to both
data-driven and data-free cases. We demonstrate the successful application of
our method to various optimal control problems for different control variables
with diverse PDE constraints from the Poisson equation to Burgers' equation.
( 2
min )
Reinforcement learning has been used to train policies that outperform even
the best human players in various games. However, a large amount of data is
needed to achieve good performance, which in turn requires building large-scale
frameworks and simulators. In this paper, we study how large-scale
reinforcement learning can be applied to autonomous driving, analyze how the
resulting policies perform as the experiment size is scaled, and what the most
important factors contributing to policy performance are. To do this, we first
introduce a hardware-accelerated autonomous driving simulator, which allows us
to efficiently collect experience from billions of agent steps. This simulator
is paired with a large-scale, multi-GPU reinforcement learning framework. We
demonstrate that simultaneous scaling of dataset size, model size, and agent
steps trained provides increasingly strong driving policies in regard to
collision, traffic rule violations, and progress. In particular, our best
policy reduces the failure rate by 57% while improving progress by 23% compared
to the current state-of-the-art machine learning policies for autonomous
driving.
( 2
min )
Resistor networks have recently had a surge of interest as substrates for
energy-efficient self-learning machines. This work studies the computational
capabilities of these resistor networks. We show that electrical networks
composed of voltage sources, linear resistors, diodes and voltage-controlled
voltage sources (VCVS) can implement any continuous functions. To prove it, we
assume that the circuit elements are ideal and that the conductances of
variable resistors and the amplification factors of the VCVS's can take
arbitrary values -- arbitrarily small or arbitrarily large. The constructive
nature of our proof could also inform the design of such self-learning
electrical networks.
( 2
min )
The integration of different imaging modalities, such as structural,
diffusion tensor, and functional magnetic resonance imaging, with deep learning
models has yielded promising outcomes in discerning phenotypic characteristics
and enhancing disease diagnosis. The development of such a technique hinges on
the efficient fusion of heterogeneous multimodal features, which initially
reside within distinct representation spaces. Naively fusing the multimodal
features does not adequately capture the complementary information and could
even produce redundancy. In this work, we present a novel joint self-supervised
and supervised contrastive learning method to learn the robust latent feature
representation from multimodal MRI data, allowing the projection of
heterogeneous features into a shared common space, and thereby amalgamating
both complementary and analogous information across various modalities and
among similar subjects. We performed a comparative analysis between our
proposed method and alternative deep multimodal learning approaches. Through
extensive experiments on two independent datasets, the results demonstrated
that our method is significantly superior to several other deep multimodal
learning methods in predicting abnormal neurodevelopment. Our method has the
capability to facilitate computer-aided diagnosis within clinical practice,
harnessing the power of multimodal data.
( 2
min )
Model stores offer third-party ML models and datasets for easy project
integration, minimizing coding efforts. One might hope to find detailed
specifications of these models and datasets in the documentation, leveraging
documentation standards such as model and dataset cards. In this study, we use
statistical analysis and hybrid card sorting to assess the state of the
practice of documenting model cards and dataset cards in one of the largest
model stores in use today--Hugging Face (HF). Our findings show that only
21,902 models (39.62\%) and 1,925 datasets (28.48\%) have documentation.
Furthermore, we observe inconsistency in ethics and transparency-related
documentation for ML models and datasets.
( 2
min )
We propose an adaptive model-predictive controller that balances driving the
system to a goal state and seeking system observations that are informative
with respect to the parameters of a nonlinear autoregressive exogenous model.
The controller's objective function is derived from an expected free energy
functional and contains information-theoretic terms expressing uncertainty over
model parameters and output predictions. Experiments illustrate how parameter
uncertainty affects the control objective and evaluate the proposed controller
for a pendulum swing-up task.
( 2
min )
Numerous regularization methods for deformable image registration aim at
enforcing smooth transformations, but are difficult to tune-in a priori and
lack a clear physical basis. Physically inspired strategies have emerged,
offering a sound theoretical basis, but still necessitating complex
discretization and resolution schemes. This study introduces a regularization
strategy that does not require discretization, making it compatible with
current registration frameworks, while retaining the benefits of physically
motivated regularization for medical image registration. The proposed method
performs favorably in both synthetic and real datasets, exhibiting an accuracy
comparable to current state-of-the-art methods.
( 2
min )
Tensorial neural networks (TNNs) combine the successes of multilinear algebra
with those of deep learning to enable extremely efficient reduced-order models
of high-dimensional problems. Here, I describe a deep neural network
architecture that fuses multiple TNNs into a larger network, intended to solve
a broader class of problems than a single TNN. I evaluate this architecture,
referred to as a "stacked tensorial neural network" (STNN), on a parametric PDE
with three independent variables and three parameters. The three parameters
correspond to one PDE coefficient and two quantities describing the domain
geometry. The STNN provides an accurate reduced-order description of the
solution manifold over a wide range of parameters. There is also evidence of
meaningful generalization to parameter values outside its training data.
Finally, while the STNN architecture is relatively simple and problem agnostic,
it can be regularized to incorporate problem-specific features like symmetries
and physical modeling assumptions.
( 2
min )
In this paper we define a population parameter, ``Generalized Variable
Importance Metric (GVIM)'', to measure importance of predictors for black box
machine learning methods, where the importance is not represented by
model-based parameter. GVIM is defined for each input variable, using the true
conditional expectation function, and it measures the variable's importance in
affecting a continuous or a binary response. We extend previously published
results to show that the defined GVIM can be represented as a function of the
Conditional Average Treatment Effect (CATE) for any kind of a predictor, which
gives it a causal interpretation and further justification as an alternative to
classical measures of significance that are only available in simple parametric
models. Extensive set of simulations using realistically complex relationships
between covariates and outcomes and number of regression techniques of varying
degree of complexity show the performance of our proposed estimator of the
GVIM.
( 2
min )
Most fair machine learning methods either highly rely on the sensitive
information of the training samples or require a large modification on the
target models, which hinders their practical application. To address this
issue, we propose a two-stage training algorithm named FAIRIF. It minimizes the
loss over the reweighted data set (second stage) where the sample weights are
computed to balance the model performance across different demographic groups
(first stage). FAIRIF can be applied on a wide range of models trained by
stochastic gradient descent without changing the model, while only requiring
group annotations on a small validation set to compute sample weights.
Theoretically, we show that, in the classification setting, three notions of
disparity among different groups can be mitigated by training with the weights.
Experiments on synthetic data sets demonstrate that FAIRIF yields models with
better fairness-utility trade-offs against various types of bias; and on
real-world data sets, we show the effectiveness and scalability of FAIRIF.
Moreover, as evidenced by the experiments with pretrained models, FAIRIF is
able to alleviate the unfairness issue of pretrained models without hurting
their performance.
( 3
min )
In stochastic zeroth-order optimization, a problem of practical relevance is
understanding how to fully exploit the local geometry of the underlying
objective function. We consider a fundamental setting in which the objective
function is quadratic, and provide the first tight characterization of the
optimal Hessian-dependent sample complexity. Our contribution is twofold.
First, from an information-theoretic point of view, we prove tight lower bounds
on Hessian-dependent complexities by introducing a concept called energy
allocation, which captures the interaction between the searching algorithm and
the geometry of objective functions. A matching upper bound is obtained by
solving the optimal energy spectrum. Then, algorithmically, we show the
existence of a Hessian-independent algorithm that universally achieves the
asymptotic optimal sample complexities for all Hessian instances. The optimal
sample complexities achieved by our algorithm remain valid for heavy-tailed
noise distributions, which are enabled by a truncation method.
( 2
min )
We introduce a pivot for exact selective inference with randomization. Not
only does our pivot lead to exact inference in Gaussian regression models, but
it is also available in closed form. We reduce the problem of exact selective
inference to a bivariate truncated Gaussian distribution. By doing so, we give
up some power that is achieved with approximate maximum likelihood estimation
in Panigrahi and Taylor (2022). Yet our pivot always produces narrower
confidence intervals than a closely related data splitting procedure. We
investigate the trade-off between power and exact selective inference on
simulated datasets and an HIV drug resistance dataset.
( 2
min )
The Implicitly Normalized Forecaster (INF) algorithm is considered to be an
optimal solution for adversarial multi-armed bandit (MAB) problems. However,
most of the existing complexity results for INF rely on restrictive
assumptions, such as bounded rewards. Recently, a related algorithm was
proposed that works for both adversarial and stochastic heavy-tailed MAB
settings. However, this algorithm fails to fully exploit the available data.
In this paper, we propose a new version of INF called the Implicitly
Normalized Forecaster with clipping (INF-clip) for MAB problems with
heavy-tailed reward distributions. We establish convergence results under mild
assumptions on the rewards distribution and demonstrate that INF-clip is
optimal for linear heavy-tailed stochastic MAB problems and works well for
non-linear ones. Furthermore, we show that INF-clip outperforms the
best-of-both-worlds algorithm in cases where it is difficult to distinguish
between different arms.
( 2
min )
Arunachalam and de Wolf (2018) showed that the sample complexity of quantum
batch learning of boolean functions, in the realizable and agnostic settings,
has the same form and order as the corresponding classical sample complexities.
In this paper, we extend this, ostensibly surprising, message to batch
multiclass learning, online boolean learning, and online multiclass learning.
For our online learning results, we first consider an adaptive adversary
variant of the classical model of Dawid and Tewari (2022). Then, we introduce
the first (to the best of our knowledge) model of online learning with quantum
examples.
( 2
min )
We introduce a new approach for generating sequences of implied volatility
(IV) surfaces across multiple assets that is faithful to historical prices. We
do so using a combination of functional data analysis and neural stochastic
differential equations (SDEs) combined with a probability integral transform
penalty to reduce model misspecification. We demonstrate that learning the
joint dynamics of IV surfaces and prices produces market scenarios that are
consistent with historical features and lie within the sub-manifold of surfaces
that are essentially free of static arbitrage. Finally, we demonstrate that
delta hedging using the simulated surfaces generates profit and loss (P&L)
distributions that are consistent with realised P&Ls.
( 2
min )
We consider the problem of sufficient dimension reduction (SDR) for
multi-index models. The estimators of the central mean subspace in prior works
either have slow (non-parametric) convergence rates, or rely on stringent
distributional conditions (e.g., the covariate distribution $P_{\mathbf{X}}$
being elliptical symmetric). In this paper, we show that a fast parametric
convergence rate of form $C_d \cdot n^{-1/2}$ is achievable via estimating the
\emph{expected smoothed gradient outer product}, for a general class of
distribution $P_{\mathbf{X}}$ admitting Gaussian or heavier distributions. When
the link function is a polynomial with a degree of at most $r$ and
$P_{\mathbf{X}}$ is the standard Gaussian, we show that the prefactor depends
on the ambient dimension $d$ as $C_d \propto d^r$.
( 2
min )
Unsupervised learning has become a staple in classical machine learning,
successfully identifying clustering patterns in data across a broad range of
domain applications. Surprisingly, despite its accuracy and elegant simplicity,
unsupervised learning has not been sufficiently exploited in the realm of
phylogenetic tree inference. The main reason for the delay in adoption of
unsupervised learning in phylogenetics is the lack of a meaningful, yet simple,
way of embedding phylogenetic trees into a vector space. Here, we propose the
simple yet powerful split-weight embedding which allows us to fit standard
clustering algorithms to the space of phylogenetic trees. We show that our
split-weight embedded clustering is able to recover meaningful evolutionary
relationships in simulated and real (Adansonia baobabs) data.
( 2
min )
Predicting audio quality in voice synthesis and conversion systems is a
critical yet challenging task, especially when traditional methods like Mean
Opinion Scores (MOS) are cumbersome to collect at scale. This paper addresses
the gap in efficient audio quality prediction, especially in low-resource
settings where extensive MOS data from large-scale listening tests may be
unavailable. We demonstrate that uncertainty measures derived from
out-of-the-box pretrained self-supervised learning (SSL) models, such as
wav2vec, correlate with MOS scores. These findings are based on data from the
2022 and 2023 VoiceMOS challenges. We explore the extent of this correlation
across different models and language contexts, revealing insights into how
inherent uncertainties in SSL models can serve as effective proxies for audio
quality assessment. In particular, we show that the contrastive wav2vec models
are the most performant in all settings.
( 2
min )
Deep Neural Networks (DNNs) are powerful tools for various computer vision
tasks, yet they often struggle with reliable uncertainty quantification - a
critical requirement for real-world applications. Bayesian Neural Networks
(BNN) are equipped for uncertainty estimation but cannot scale to large DNNs
that are highly unstable to train. To address this challenge, we introduce the
Adaptable Bayesian Neural Network (ABNN), a simple and scalable strategy to
seamlessly transform DNNs into BNNs in a post-hoc manner with minimal
computational and training overheads. ABNN preserves the main predictive
properties of DNNs while enhancing their uncertainty quantification abilities
through simple BNN adaptation layers (attached to normalization layers) and a
few fine-tuning steps on pre-trained models. We conduct extensive experiments
across multiple datasets for image classification and semantic segmentation
tasks, and our results demonstrate that ABNN achieves state-of-the-art
performance without the computational budget typically associated with ensemble
methods.
( 2
min )
We propose an adaptive model-predictive controller that balances driving the
system to a goal state and seeking system observations that are informative
with respect to the parameters of a nonlinear autoregressive exogenous model.
The controller's objective function is derived from an expected free energy
functional and contains information-theoretic terms expressing uncertainty over
model parameters and output predictions. Experiments illustrate how parameter
uncertainty affects the control objective and evaluate the proposed controller
for a pendulum swing-up task.
( 2
min )
NVIDIA’s AI Podcast had its best year yet — with a record-breaking 1.2 million plays in 2023 and each biweekly episode now drawing more than 30,000 listens. Among tech’s top podcasts, the AI Podcast has racked up more than 200 episodes and nearly 5 million total plays since its debut in 2016. Listeners across the Read article >
( 5
min )
NVIDIA’s holiday card — enchanting viewers from the perspective of snuggled-up family members on a couch — warmly depicts a crackling fireplace and an NVIDIA robo-dog by the hearth, all framed by a string of sparkling lights.
( 8
min )
Transformers play a central role in the inner workings of large language
models. We develop a mathematical framework for analyzing Transformers based on
their interpretation as interacting particle systems, which reveals that
clusters emerge in long time. Our study explores the underlying theory and
offers new perspectives for mathematicians as well as computer scientists.
( 2
min )
In this paper we revisit the classical problem of classification, but impose
privacy constraints. Under such constraints, the raw data
$(X_1,Y_1),\ldots,(X_n,Y_n)$ cannot be directly observed, and all classifiers
are functions of the randomised outcome of a suitable local differential
privacy mechanism. The statistician is free to choose the form of this privacy
mechanism, and here we add Laplace distributed noise to a discretisation of the
location of each feature vector $X_i$ and to its label $Y_i$. The
classification rule is the privatized version of the well-studied partitioning
classification rule. In addition to the standard Lipschitz and margin
conditions, a novel characteristic is introduced, by which the exact rate of
convergence of the classification error probability is calculated, both for
non-private and private data.
( 2
min )
In this work, a novel Stackelberg game theoretic framework is proposed for
trading energy bidirectionally between the demand-response (DR) aggregator and
the prosumers. This formulation allows for flexible energy arbitrage and
additional monetary rewards while ensuring that the prosumers' desired daily
energy demand is met. Then, a scalable (linear with the number of prosumers),
decentralized, privacy-preserving algorithm is proposed to find approximate
equilibria with online sampling and learning of the prosumers' cumulative best
response, which finds applications beyond this energy game. Moreover, cost
bounds are provided on the quality of the approximate equilibrium solution.
Finally, real data from the California day-ahead market and the UC Davis campus
building energy demands are utilized to demonstrate the efficacy of the
proposed framework and algorithm.
( 2
min )
This paper explores the application diffusion maps as graph shift operators
in understanding the underlying geometry of graph signals. The study evaluates
the improvements in graph learning when using diffusion map generated filters
to the Markov Variation minimization problem. The paper showcases the
effectiveness of this approach through examples involving synthetically
generated and real-world temperature sensor data. These examples also compare
the diffusion map graph signal model with other commonly used graph signal
operators. The results provide new approaches for the analysis and
understanding of complex, non-Euclidean data structures.
( 2
min )
We propose SutraNets, a novel method for neural probabilistic forecasting of
long-sequence time series. SutraNets use an autoregressive generative model to
factorize the likelihood of long sequences into products of conditional
probabilities. When generating long sequences, most autoregressive approaches
suffer from harmful error accumulation, as well as challenges in modeling
long-distance dependencies. SutraNets treat long, univariate prediction as
multivariate prediction over lower-frequency sub-series. Autoregression
proceeds across time and across sub-series in order to ensure coherent
multivariate (and, hence, high-frequency univariate) outputs. Since sub-series
can be generated using fewer steps, SutraNets effectively reduce error
accumulation and signal path distances. We find SutraNets to significantly
improve forecasting accuracy over competitive alternatives on six real-world
datasets, including when we vary the number of sub-series and scale up the
depth and width of the underlying sequence models.
( 2
min )
Automated machine learning (AutoML) systems propose an end-to-end solution to
a given machine learning problem, creating either fixed or flexible pipelines.
Fixed pipelines are task independent constructs: their general composition
remains the same, regardless of the data. In contrast, the structure of
flexible pipelines varies depending on the input, making them finely tailored
to individual tasks. However, flexible pipelines can be structurally
overcomplicated and have poor explainability. We propose the EVOSA approach
that compensates for the negative points of flexible pipelines by incorporating
a sensitivity analysis which increases the robustness and interpretability of
the flexible solutions. EVOSA quantitatively estimates positive and negative
impact of an edge or a node on a pipeline graph, and feeds this information to
the evolutionary AutoML optimizer. The correctness and efficiency of EVOSA was
validated in tabular, multimodal and computer vision tasks, suggesting
generalizability of the proposed approach across domains.
( 2
min )
Public release of the weights of pretrained foundation models, otherwise
known as downloadable access \citep{solaiman_gradient_2023}, enables
fine-tuning without the prohibitive expense of pretraining. Our work argues
that increasingly accessible fine-tuning of downloadable models may increase
hazards. First, we highlight research to improve the accessibility of
fine-tuning. We split our discussion into research that A) reduces the
computational cost of fine-tuning and B) improves the ability to share that
cost across more actors. Second, we argue that increasingly accessible
fine-tuning methods may increase hazard through facilitating malicious use and
making oversight of models with potentially dangerous capabilities more
difficult. Third, we discuss potential mitigatory measures, as well as benefits
of more accessible fine-tuning. Given substantial remaining uncertainty about
hazards, we conclude by emphasizing the urgent need for the development of
mitigations.
( 2
min )
The Adam optimizer is a popular choice in contemporary deep learning, due to
its strong empirical performance. However we observe that in privacy sensitive
scenarios, the traditional use of Differential Privacy (DP) with the Adam
optimizer leads to sub-optimal performance on several tasks. We find that this
performance degradation is due to a DP bias in Adam's second moment estimator,
introduced by the addition of independent noise in the gradient computation to
enforce DP guarantees. This DP bias leads to a different scaling for low
variance parameter updates, that is inconsistent with the behavior of
non-private Adam. We propose DP-AdamBC, an optimization algorithm which removes
the bias in the second moment estimation and retrieves the expected behaviour
of Adam. Empirically, DP-AdamBC significantly improves the optimization
performance of DP-Adam by up to 3.5% in final accuracy in image, text, and
graph node classification tasks.
( 2
min )
Pulmonary Hypertension (PH) is a severe disease characterized by an elevated
pulmonary artery pressure. The gold standard for PH diagnosis is measurement of
mean Pulmonary Artery Pressure (mPAP) during an invasive Right Heart
Catheterization. In this paper, we investigate noninvasive approach to PH
detection utilizing Magnetic Resonance Imaging, Computer Models and Machine
Learning. We show using the ablation study, that physics-informed feature
engineering based on models of blood circulation increases the performance of
Gradient Boosting Decision Trees-based algorithms for classification of PH and
regression of values of mPAP. We compare results of regression (with
thresholding of estimated mPAP) and classification and demonstrate that metrics
achieved in both experiments are comparable. The predicted mPAP values are more
informative to the physicians than the probability of PH returned by
classification models. They provide the intuitive explanation of the outcome of
the machine learning model (clinicians are accustomed to the mPAP metric,
contrary to the PH probability).
( 2
min )
The goal of this work is to develop accurate Machine Learning (ML) models for
predicting the assembly axial neutron flux profiles in the SAFARI-1 research
reactor, trained by measurement data from historical cycles. The data-driven
nature of ML models makes them susceptible to uncertainties which are
introduced by sources such as noise in training data, incomplete coverage of
the domain, extrapolation and imperfect model architectures. To this end, we
also aim at quantifying the approximation uncertainties of the ML model
predictions. Previous work using Deep Neural Networks (DNNs) has been
successful for fuel assemblies in SAFARI-1, however, not as accurate for
control follower assemblies. The aim of this work is to improve the ML models
for the control assemblies by a combination of supervised and unsupervised ML
algorithms. The $k$-means and Affinity Propagation unsupervised ML algorithms
are employed to identify clusters in the set of the measured axial neutron flux
profiles. Then, regression-based supervised ML models using DNN (with
prediction uncertainties quantified with Monte Carlo dropout) and Gaussian
Process (GP) are trained for different clusters and the prediction uncertainty
is estimated. It was found that applying the proposed procedure improves the
prediction accuracy for the control assemblies and reduces the prediction
uncertainty. Flux shapes predicted by DNN and GP are very close, and the
overall accuracy became comparable to the fuel assemblies. The prediction
uncertainty is however smaller for GP models.
( 3
min )
The joint source coding and modulation (JSCM) framework was enabled by recent
developments in deep learning, which allows to automatically learn from data,
and in an end-to-end fashion, the best compression codes and modulation
schemes. In this paper, we show the existence of a strict tradeoff between
channel rate, distortion, perception, and classification accuracy in a JSCM
scenario. We then propose two image compression methods to navigate that
tradeoff: an inverse-domain generative adversarial network (ID-GAN), which
achieves extreme compression, and a simpler, heuristic method that reveals
insights about the performance of ID-GAN. Experiment results not only
corroborate the theoretical findings, but also demonstrate that the proposed
ID-GAN algorithm significantly improves system performance compared to
traditional separation-based methods and recent deep JSCM architectures.
( 2
min )
On-device training is essential for neural networks (NNs) to continuously
adapt to new online data, but can be time-consuming due to the device's limited
computing power. To speed up on-device training, existing schemes select
trainable NN portion offline or conduct unrecoverable selection at runtime, but
the evolution of trainable NN portion is constrained and cannot adapt to the
current need for training. Instead, runtime adaptation of on-device training
should be fully elastic, i.e., every NN substructure can be freely removed from
or added to the trainable NN portion at any time in training. In this paper, we
present ElasticTrainer, a new technique that enforces such elasticity to
achieve the required training speedup with the minimum NN accuracy loss.
Experiment results show that ElasticTrainer achieves up to 3.5x more training
speedup in wall-clock time and reduces energy consumption by 2x-3x more
compared to the existing schemes, without noticeable accuracy loss.
( 2
min )
To comply with new legal requirements and policies committed to privacy
protection, more and more companies start to deploy cross-silo Federated
Learning at global scale, where several clients/silos collaboratively train a
global model under the coordination of a central server. Instead of data
sharing and transmission, clients train models using their private local data
and exchange model updates. However, there is little understanding of the
carbon emission impact of cross silo Federated Learning due to the lack of
related works. In this study, we first analyze the sustainability aspect of
cross-silo Federated Learning, across the AI product life cycle instead of
focusing only on the model training, with the comparison to the centralized
method. A more holistic quantitative cost and CO2 emission estimation method
for real world cross-silo Federated Learning setting is proposed. Secondly, we
propose a novel data and application management system using cross silo
Federated Learning and analytics to make IT companies more sustainable and cost
effective.
( 2
min )
The growing number of wireless edge devices has magnified challenges
concerning energy, bandwidth, latency, and data heterogeneity. These challenges
have become bottlenecks for distributed learning. To address these issues, this
paper presents a novel approach that ensures energy efficiency for
distributionally robust federated learning (FL) with over air computation
(AirComp). In this context, to effectively balance robustness with energy
efficiency, we introduce a novel client selection method that integrates two
complementary insights: a deterministic one that is designed for energy
efficiency, and a probabilistic one designed for distributional robustness.
Simulation results underscore the efficacy of the proposed algorithm, revealing
its superior performance compared to baselines from both robustness and energy
efficiency perspectives, achieving more than 3-fold energy savings compared to
the considered baselines.
( 2
min )
Operator learning aims to discover properties of an underlying dynamical
system or partial differential equation (PDE) from data. Here, we present a
step-by-step guide to operator learning. We explain the types of problems and
PDEs amenable to operator learning, discuss various neural network
architectures, and explain how to employ numerical PDE solvers effectively. We
also give advice on how to create and manage training data and conduct
optimization. We offer intuition behind the various neural network
architectures employed in operator learning by motivating them from the
point-of-view of numerical linear algebra.
( 2
min )
In this study, we establish that deep neural networks employing ReLU and
ReLU$^2$ activation functions are capable of representing Lagrange finite
element functions of any order on simplicial meshes across arbitrary
dimensions. We introduce a novel global formulation of the basis functions for
Lagrange elements, grounded in a geometric decomposition of these elements and
leveraging two essential properties of high-dimensional simplicial meshes and
barycentric coordinate functions. This representation theory facilitates a
natural approximation result for such deep neural networks. Our findings
present the first demonstration of how deep neural networks can systematically
generate general continuous piecewise polynomial functions.
( 2
min )
The detection of out-of-distribution data points is a common task in particle
physics. It is used for monitoring complex particle detectors or for
identifying rare and unexpected events that may be indicative of new phenomena
or physics beyond the Standard Model. Recent advances in Machine Learning for
anomaly detection have encouraged the utilization of such techniques on
particle physics problems. This review article provides an overview of the
state-of-the-art techniques for anomaly detection in particle physics using
machine learning. We discuss the challenges associated with anomaly detection
in large and complex data sets, such as those produced by high-energy particle
colliders, and highlight some of the successful applications of anomaly
detection in particle physics experiments.
( 2
min )
Microring resonators (MRRs) are promising devices for time-delay photonic
reservoir computing, but the impact of the different physical effects taking
place in the MRRs on the reservoir computing performance is yet to be fully
understood. We numerically analyze the impact of linear losses as well as
thermo-optic and free-carrier effects relaxation times on the prediction error
of the time-series task NARMA-10. We demonstrate the existence of three
regions, defined by the input power and the frequency detuning between the
optical source and the microring resonance, that reveal the cavity transition
from linear to nonlinear regimes. One of these regions offers very low error in
time-series prediction under relatively low input power and number of nodes
while the other regions either lack nonlinearity or become unstable. This study
provides insight into the design of the MRR and the optimization of its
physical properties for improving the prediction performance of time-delay
reservoir computing.
( 2
min )
Forward invariance is a long-studied property in control theory that is used
to certify that a dynamical system stays within some pre-specified set of
states for all time, and also admits robustness guarantees (e.g., the
certificate holds under perturbations). We propose a general framework for
training and provably certifying robust forward invariance in Neural ODEs. We
apply this framework to provide certified safety in robust continuous control.
To our knowledge, this is the first instance of training Neural ODE policies
with such non-vacuous certified guarantees. In addition, we explore the
generality of our framework by using it to certify adversarial robustness for
image classification.
( 2
min )
Black-box variational inference is widely used in situations where there is
no proof that its stochastic optimization succeeds. We suggest this is due to a
theoretical gap in existing stochastic optimization proofs: namely the
challenge of gradient estimators with unusual noise bounds, and a composite
non-smooth objective. For dense Gaussian variational families, we observe that
existing gradient estimators based on reparameterization satisfy a quadratic
noise bound and give novel convergence guarantees for proximal and projected
stochastic gradient descent using this bound. This provides rigorous guarantees
that methods similar to those used in practice converge on realistic inference
problems.
( 2
min )
Through iterative, cross-disciplinary discussions, we define and propose
next-steps for Human-centered Generative AI (HGAI). We contribute a
comprehensive research agenda that lays out future directions of Generative AI
spanning three levels: aligning with human values; assimilating human intents;
and augmenting human abilities. By identifying these next-steps, we intend to
draw interdisciplinary research teams to pursue a coherent set of emergent
ideas in HGAI, focusing on their interested topics while maintaining a coherent
big picture of the future work landscape.
( 2
min )
Increased focus on the computational efficiency of NLP systems has motivated
the design of efficient model architectures and improvements to underlying
hardware accelerators. However, the resulting increases in computational
throughput and reductions in floating point operations have not directly
translated to improvements in wall-clock inference latency. We demonstrate that
these discrepancies can be largely attributed to bottlenecks introduced by deep
learning frameworks. We denote this phenomenon as the \textit{framework tax},
and observe that the disparity is growing as hardware speed increases over
time. In this work, we examine this phenomenon through a series of case studies
analyzing the effects of model design decisions, framework paradigms, and
hardware platforms on total model latency. Code is available at
https://github.com/JaredFern/Framework-Tax.
( 2
min )
The recent emergence of large language models (LLMs) shows the potential for
artificial general intelligence, revealing new opportunities in industry 4.0
and smart manufacturing. However, a notable gap exists in applying these LLMs
in industry, primarily due to their training on general knowledge rather than
domain-specific knowledge. Such specialized domain knowledge is vital for
effectively addressing the complex needs of industrial applications. To bridge
this gap, this paper proposes an Industrial Large Knowledge Model (ILKM)
framework emphasizing their potential to revolutionize the industry in smart
manufacturing. In addition, ILKMs and LLMs are compared from eight
perspectives. Finally, "6S Principle" is proposed as the guideline for the
development of ILKMs in smart manufacturing.
( 2
min )
Robustness is a fundamental property of machine learning classifiers required
to achieve safety and reliability. In the field of adversarial robustness of
image classifiers, robustness is commonly defined as the stability of a model
to all input changes within a p-norm distance. However, in the field of random
corruption robustness, variations observed in the real world are used, while
p-norm corruptions are rarely considered. This study investigates the use of
random p-norm corruptions to augment the training and test data of image
classifiers. We evaluate the model robustness against imperceptible random
p-norm corruptions and propose a novel robustness metric. We empirically
investigate whether robustness transfers across different p-norms and derive
conclusions on which p-norm corruptions a model should be trained and
evaluated. We find that training data augmentation with a combination of p-norm
corruptions significantly improves corruption robustness, even on top of
state-of-the-art data augmentation schemes.
( 2
min )
In recent years, the rapid development of deep learning has led to a wide
range of applications in the field of medical image classification. The
variants of neural network models with ever-increasing performance share some
commonalities: to try to mitigate overfitting, improve generalization, avoid
gradient vanishing and exploding, etc. AlexNet first utilizes the dropout
technique to mitigate overfitting and the ReLU activation function to avoid
gradient vanishing. Therefore, we focus our discussion on AlexNet, which has
contributed greatly to the development of CNNs in 2012. After reviewing over 40
papers, including journal papers and conference papers, we give a narrative on
the technical details, advantages, and application areas of AlexNet.
( 2
min )
We present a neural network for mitigating biased errors in pseudoranges to
improve localization performance with data collected from mobile phones. A
satellite-wise Multilayer Perceptron (MLP) is designed to regress the
pseudorange bias correction from six satellite, receiver, context-related
features derived from Android raw Global Navigation Satellite System (GNSS)
measurements. To train the MLP, we carefully calculate the target values of
pseudorange bias using location ground truth and smoothing techniques and
optimize a loss function involving the estimation residuals of smartphone clock
bias. The corrected pseudoranges are then used by a model-based localization
engine to compute locations. The Google Smartphone Decimeter Challenge (GSDC)
dataset, which contains Android smartphone data collected from both rural and
urban areas, is utilized for evaluation. Both fingerprinting and cross-trace
localization results demonstrate that our proposed method outperforms
model-based and state-of-the-art data-driven approaches.
( 2
min )
Acoustic-to-articulatory inversion (AAI) involves mapping from the acoustic
to the articulatory space. Signal-processing features like the MFCCs, have been
widely used for the AAI task. For subjects with dysarthric speech, AAI is
challenging because of an imprecise and indistinct pronunciation. In this work,
we perform AAI for dysarthric speech using representations from pre-trained
self-supervised learning (SSL) models. We demonstrate the impact of different
pre-trained features on this challenging AAI task, at low-resource conditions.
In addition, we also condition x-vectors to the extracted SSL features to train
a BLSTM network. In the seen case, we experiment with three AAI training
schemes (subject-specific, pooled, and fine-tuned). The results, consistent
across training schemes, reveal that DeCoAR, in the fine-tuned scheme, achieves
a relative improvement of the Pearson Correlation Coefficient (CC) by ~1.81%
and ~4.56% for healthy controls and patients, respectively, over MFCCs. We
observe similar average trends for different SSL features in the unseen case.
Overall, SSL networks like wav2vec, APC, and DeCoAR, trained with feature
reconstruction or future timestep prediction tasks, perform well in predicting
dysarthric articulatory trajectories.
( 2
min )
The success of machine learning (ML) has been accompanied by increased
concerns about its trustworthiness. Several jurisdictions are preparing ML
regulatory frameworks. One such concern is ensuring that model training data
has desirable distributional properties for certain sensitive attributes. For
example, draft regulations indicate that model trainers are required to show
that training datasets have specific distributional properties, such as
reflecting diversity of the population.
We propose the notion of property attestation allowing a prover (e.g., model
trainer) to demonstrate relevant distributional properties of training data to
a verifier (e.g., a customer) without revealing the data. We present an
effective hybrid property attestation combining property inference with
cryptographic mechanisms.
( 2
min )
We study and introduce new gradient operators in the complex and bicomplex
settings, inspired from the well-known Least Mean Square (LMS) algorithm
invented in 1960 by Widrow and Hoff for Adaptive Linear Neuron (ADALINE).
These gradient operators will be used to formulate new learning rules for the
Bicomplex Least Mean Square (BLMS) algorithms and we will also formulate these
learning rules will for the case of multicomplex LMS algorithms (MLMS). This
approach extends both the classical real and complex LMS algorithms.
( 2
min )
In recent years generative adversarial networks (GANs) have been used to
supplement datasets within the field of marine bioacoustics. This is driven by
factors such as the cost to collect data, data sparsity and aid preprocessing.
One notable challenge with marine bioacoustic data is the low signal-to-noise
ratio (SNR) posing difficulty when applying deep learning techniques such as
GANs. This work investigates the effect SNR has on the audio-based GAN
performance and examines three different evaluation methodologies for GAN
performance, yielding interesting results on the effects of SNR on GANs,
specifically WaveGAN.
( 2
min )
Quantitative markets are characterized by swift dynamics and abundant
uncertainties, making the pursuit of profit-driven stock trading actions
inherently challenging. Within this context, reinforcement learning (RL), which
operates on a reward-centric mechanism for optimal control, has surfaced as a
potentially effective solution to the intricate financial decision-making
conundrums presented. This paper delves into the fusion of two established
financial trading strategies, namely the constant proportion portfolio
insurance (CPPI) and the time-invariant portfolio protection (TIPP), with the
multi-agent deep deterministic policy gradient (MADDPG) framework. As a result,
we introduce two novel multi-agent RL (MARL) methods, CPPI-MADDPG and
TIPP-MADDPG, tailored for probing strategic trading within quantitative
markets. To validate these innovations, we implemented them on a diverse
selection of 100 real-market shares. Our empirical findings reveal that the
CPPI-MADDPG and TIPP-MADDPG strategies consistently outpace their traditional
counterparts, affirming their efficacy in the realm of quantitative trading.
( 2
min )
In neural audio signal processing, pitch conditioning has been used to
enhance the performance of synthesizers. However, jointly training pitch
estimators and synthesizers is a challenge when using standard audio-to-audio
reconstruction loss, leading to reliance on external pitch trackers. To address
this issue, we propose using a spectral loss function inspired by optimal
transportation theory that minimizes the displacement of spectral energy. We
validate this approach through an unsupervised autoencoding task that fits a
harmonic template to harmonic signals. We jointly estimate the fundamental
frequency and amplitudes of harmonics using a lightweight encoder and
reconstruct the signals using a differentiable harmonic synthesizer. The
proposed approach offers a promising direction for improving unsupervised
parameter estimation in neural audio applications.
( 2
min )
Photo-trapping cameras are widely employed for wildlife monitoring. Those
cameras take photographs when motion is detected to capture images where
animals appear. A significant portion of these images are empty - no wildlife
appears in the image. Filtering out those images is not a trivial task since it
requires hours of manual work from biologists. Therefore, there is a notable
interest in automating this task. Automatic discarding of empty photo-trapping
images is still an open field in the area of Machine Learning. Existing
solutions often rely on state-of-the-art supervised convolutional neural
networks that require the annotation of the images in the training phase.
PARDINUS (Weakly suPervised discARDINg of photo-trapping empty images based on
aUtoencoderS) is constructed on the foundation of weakly supervised learning
and proves that this approach equals or even surpasses other fully supervised
methods that require further labeling work.
( 2
min )
Submodular maximization over a matroid constraint is a fundamental problem
with various applications in machine learning. Some of these applications
involve decision-making over datapoints with sensitive attributes such as
gender or race. In such settings, it is crucial to guarantee that the selected
solution is fairly distributed with respect to this attribute. Recently,
fairness has been investigated in submodular maximization under a cardinality
constraint in both the streaming and offline settings, however the more general
problem with matroid constraint has only been considered in the streaming
setting and only for monotone objectives. This work fills this gap. We propose
various algorithms and impossibility results offering different trade-offs
between quality, fairness, and generality.
( 2
min )
The increasing reliance of drivers on navigation applications has made
transportation networks more susceptible to data-manipulation attacks by
malicious actors. Adversaries may exploit vulnerabilities in the data
collection or processing of navigation services to inject false information,
and to thus interfere with the drivers' route selection. Such attacks can
significantly increase traffic congestions, resulting in substantial waste of
time and resources, and may even disrupt essential services that rely on road
networks. To assess the threat posed by such attacks, we introduce a
computational framework to find worst-case data-injection attacks against
transportation networks. First, we devise an adversarial model with a threat
actor who can manipulate drivers by increasing the travel times that they
perceive on certain roads. Then, we employ hierarchical multi-agent
reinforcement learning to find an approximate optimal adversarial strategy for
data manipulation. We demonstrate the applicability of our approach through
simulating attacks on the Sioux Falls, ND network topology.
( 2
min )
In this work, we propose REBEL, an algorithm for sample efficient reward
regularization based robotic reinforcement learning from human feedback
(RRLHF). Reinforcement learning (RL) performance for continuous control
robotics tasks is sensitive to the underlying reward function. In practice, the
reward function often ends up misaligned with human intent, values, social
norms, etc., leading to catastrophic failures in the real world. We leverage
human preferences to learn regularized reward functions and eventually align
the agents with the true intended behavior. We introduce a novel notion of
reward regularization to the existing RRLHF framework, which is termed as agent
preferences. So, we not only consider human feedback in terms of preferences,
we also propose to take into account the preference of the underlying RL agent
while learning the reward function. We show that this helps to improve the
over-optimization associated with the design of reward functions in RL. We
experimentally show that REBEL exhibits up to 70% improvement in sample
efficiency to achieve a similar level of episodic reward returns as compared to
the state-of-the-art methods such as PEBBLE and PEBBLE+SURF.
( 2
min )
Traditional spectral energy distribution (SED) fitting techniques face
uncertainties due to assumptions in star formation histories and dust
attenuation curves. We propose an advanced machine learning-based approach that
enhances flexibility and uncertainty quantification in SED fitting. Unlike the
fixed NGBoost model used in mirkwood, our approach allows for any
sklearn-compatible model, including deterministic models. We incorporate
conformalized quantile regression to convert point predictions into error bars,
enhancing interpretability and reliability. Using CatBoost as the base
predictor, we compare results with and without conformal prediction,
demonstrating improved performance using metrics such as coverage and interval
width. Our method offers a more versatile and accurate tool for deriving galaxy
physical properties from observational data.
( 2
min )
In this work, we introduce an innovative autoregressive model leveraging
Generative Pretrained Transformer (GPT) architectures, tailored for fraud
detection in payment systems. Our approach innovatively confronts token
explosion and reconstructs behavioral sequences, providing a nuanced
understanding of transactional behavior through temporal and contextual
analysis. Utilizing unsupervised pretraining, our model excels in feature
representation without the need for labeled data. Additionally, we integrate a
differential convolutional approach to enhance anomaly detection, bolstering
the security and efficacy of one of the largest online payment merchants in
China. The scalability and adaptability of our model promise broad
applicability in various transactional contexts.
( 2
min )
The greatest demand for today's computing is machine learning. This paper
analyzes three machine learning algorithms: transformers, spatial convolution,
and FFT. The analysis is novel in three aspects. First, it measures the cost of
memory access on an abstract memory hierarchy, instead of traditional time or
space complexity. Second, the analysis is asymptotic and identifies the primary
sources of the memory cost. Finally, the result is symbolic, which can be used
to select algorithmic parameters such as the group size in grouped query
attention for any dimension size and number of heads and the batch size for
batched convolution for any image size and kernel size.
( 2
min )
Training large foundation models using self-supervised objectives on
unlabeled data, followed by fine-tuning on downstream tasks, has emerged as a
standard procedure. Unfortunately, the efficacy of this approach is often
constrained by both limited fine-tuning compute and scarcity in labeled
downstream data. We introduce Multimodal Attention Merging (MAM), an attempt
that facilitates direct knowledge transfer from attention matrices of models
rooted in high resource modalities, text and images, to those in
resource-constrained domains, speech and audio, employing a zero-shot paradigm.
MAM reduces the relative Word Error Rate (WER) of an Automatic Speech
Recognition (ASR) model by up to 6.70%, and relative classification error of an
Audio Event Classification (AEC) model by 10.63%. In cases where some
data/compute is available, we present Learnable-MAM, a data-driven approach to
merging attention matrices, resulting in a further 2.90% relative reduction in
WER for ASR and 18.42% relative reduction in AEC compared to fine-tuning.
( 2
min )
Gate-defined quantum dots are a promising candidate system to realize
scalable, coupled qubit systems and serve as a fundamental building block for
quantum computers. However, present-day quantum dot devices suffer from
imperfections that must be accounted for, which hinders the characterization,
tuning, and operation process. Moreover, with an increasing number of quantum
dot qubits, the relevant parameter space grows sufficiently to make heuristic
control infeasible. Thus, it is imperative that reliable and scalable
autonomous tuning approaches are developed. In this report, we outline current
challenges in automating quantum dot device tuning and operation with a
particular focus on datasets, benchmarking, and standardization. We also
present ideas put forward by the quantum dot community on how to overcome them.
( 2
min )
Discovering mathematical models that characterize the observed behavior of
dynamical systems remains a major challenge, especially for systems in a
chaotic regime. The challenge is even greater when the physics underlying such
systems is not yet understood, and scientific inquiry must solely rely on
empirical data. Driven by the need to fill this gap, we develop a framework
that learns mathematical expressions modeling complex dynamical behaviors by
identifying differential equations from noisy and sparse observable data. We
train a small neural network to learn the dynamics of a system, its rate of
change in time, and missing model terms, which are used as input for a symbolic
regression algorithm to autonomously distill the explicit mathematical terms.
This, in turn, enables us to predict the future evolution of the dynamical
behavior. The performance of this framework is validated by recovering the
right-hand sides and unknown terms of certain complex, chaotic systems such as
the well-known Lorenz system, a six-dimensional hyperchaotic system, and the
non-autonomous Sprott chaotic system, and comparing them with their known
analytical expressions.
( 2
min )
The surge in high-throughput omics data has reshaped the landscape of
biological research, underlining the need for powerful, user-friendly data
analysis and interpretation tools. This paper presents GenoCraft, a web-based
comprehensive software solution designed to handle the entire pipeline of omics
data processing. GenoCraft offers a unified platform featuring advanced
bioinformatics tools, covering all aspects of omics data analysis. It
encompasses a range of functionalities, such as normalization, quality control,
differential analysis, network analysis, pathway analysis, and diverse
visualization techniques. This software makes state-of-the-art omics data
analysis more accessible to a wider range of users. With GenoCraft, researchers
and data scientists have access to an array of cutting-edge bioinformatics
tools under a user-friendly interface, making it a valuable resource for
managing and analyzing large-scale omics data. The API with an interactive web
interface is publicly available at https://genocraft.stanford. edu/. We also
release all the codes in https://github.com/futianfan/GenoCraft.
( 2
min )
In this paper we revisit the classical problem of classification, but impose
privacy constraints. Under such constraints, the raw data
$(X_1,Y_1),\ldots,(X_n,Y_n)$ cannot be directly observed, and all classifiers
are functions of the randomised outcome of a suitable local differential
privacy mechanism. The statistician is free to choose the form of this privacy
mechanism, and here we add Laplace distributed noise to a discretisation of the
location of each feature vector $X_i$ and to its label $Y_i$. The
classification rule is the privatized version of the well-studied partitioning
classification rule. In addition to the standard Lipschitz and margin
conditions, a novel characteristic is introduced, by which the exact rate of
convergence of the classification error probability is calculated, both for
non-private and private data.
( 2
min )
Robustness is a fundamental property of machine learning classifiers required
to achieve safety and reliability. In the field of adversarial robustness of
image classifiers, robustness is commonly defined as the stability of a model
to all input changes within a p-norm distance. However, in the field of random
corruption robustness, variations observed in the real world are used, while
p-norm corruptions are rarely considered. This study investigates the use of
random p-norm corruptions to augment the training and test data of image
classifiers. We evaluate the model robustness against imperceptible random
p-norm corruptions and propose a novel robustness metric. We empirically
investigate whether robustness transfers across different p-norms and derive
conclusions on which p-norm corruptions a model should be trained and
evaluated. We find that training data augmentation with a combination of p-norm
corruptions significantly improves corruption robustness, even on top of
state-of-the-art data augmentation schemes.
( 2
min )
Black-box variational inference is widely used in situations where there is
no proof that its stochastic optimization succeeds. We suggest this is due to a
theoretical gap in existing stochastic optimization proofs: namely the
challenge of gradient estimators with unusual noise bounds, and a composite
non-smooth objective. For dense Gaussian variational families, we observe that
existing gradient estimators based on reparameterization satisfy a quadratic
noise bound and give novel convergence guarantees for proximal and projected
stochastic gradient descent using this bound. This provides rigorous guarantees
that methods similar to those used in practice converge on realistic inference
problems.
( 2
min )
The use of autonomous robots for assistance tasks in hospitals has the
potential to free up qualified staff and im-prove patient care. However, the
ubiquity of deformable and transparent objects in hospital settings poses
signif-icant challenges to vision-based perception systems. We present
EfficientPPS, a neural architecture for part-aware panoptic segmentation that
provides robots with semantically rich visual information for grasping and
ma-nipulation tasks. We also present an unsupervised data collection and
labelling method to reduce the need for human involvement in the training
process. EfficientPPS is evaluated on a dataset containing real-world hospital
objects and demonstrated to be robust and efficient in grasping transparent
transfusion bags with a collaborative robot arm.
( 2
min )
In this paper, we study the collaborative learning model, which concerns the
tradeoff between parallelism and communication overhead in multi-agent
multi-armed bandits. For regret minimization in multi-armed bandits, we present
the first set of tradeoffs between the number of rounds of communication among
the agents and the regret of the collaborative learning process.
( 2
min )
We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D
radiance fields parameterized by 3D Gaussian primitives from pairs of images.
Our model features real-time and memory-efficient rendering for scalable
training as well as fast 3D reconstruction at inference time. To overcome local
minima inherent to sparse and locally supported representations, we predict a
dense probability distribution over 3D and sample Gaussian means from that
probability distribution. We make this sampling operation differentiable via a
reparameterization trick, allowing us to back-propagate gradients through the
Gaussian splatting representation. We benchmark our method on wide-baseline
novel view synthesis on the real-world RealEstate10k and ACID datasets, where
we outperform state-of-the-art light field transformers and accelerate
rendering by 2.5 orders of magnitude while reconstructing an interpretable and
editable 3D radiance field.
( 2
min )
To better understand the output of deep neural networks (DNN), attribution
based methods have been an important approach for model interpretability, which
assign a score for each input dimension to indicate its importance towards the
model outcome. Notably, the attribution methods use the axioms of sensitivity
and implementation invariance to ensure the validity and reliability of
attribution results. Yet, the existing attribution methods present challenges
for effective interpretation and efficient computation. In this work, we
introduce MFABA, an attribution algorithm that adheres to axioms, as a novel
method for interpreting DNN. Additionally, we provide the theoretical proof and
in-depth analysis for MFABA algorithm, and conduct a large scale experiment.
The results demonstrate its superiority by achieving over 101.5142 times faster
speed than the state-of-the-art attribution algorithms. The effectiveness of
MFABA is thoroughly evaluated through the statistical analysis in comparison to
other methods, and the full implementation package is open-source at:
https://github.com/LMBTough/MFABA
( 2
min )
This study explores the application of anomaly detection (AD) methods in
imbalanced learning tasks, focusing on fraud detection using real online credit
card payment data. We assess the performance of several recent AD methods and
compare their effectiveness against standard supervised learning methods.
Offering evidence of distribution shift within our dataset, we analyze its
impact on the tested models' performances. Our findings reveal that LightGBM
exhibits significantly superior performance across all evaluated metrics but
suffers more from distribution shifts than AD methods. Furthermore, our
investigation reveals that LightGBM also captures the majority of frauds
detected by AD methods. This observation challenges the potential benefits of
ensemble methods to combine supervised, and AD approaches to enhance
performance. In summary, this research provides practical insights into the
utility of these techniques in real-world scenarios, showing LightGBM's
superiority in fraud detection while highlighting challenges related to
distribution shifts.
( 2
min )
Bayesian optimization (BO) is a sample-efficient method and has been widely
used for optimizing expensive black-box functions. Recently, there has been a
considerable interest in BO literature in optimizing functions that are
affected by context variable in the environment, which is uncontrollable by
decision makers. In this paper, we focus on the optimization of functions'
expectations over continuous context variable, subject to an unknown
distribution. To address this problem, we propose two algorithms that employ
kernel density estimation to learn the probability density function (PDF) of
continuous context variable online. The first algorithm is simpler, which
directly optimizes the expectation under the estimated PDF. Considering that
the estimated PDF may have high estimation error when the true distribution is
complicated, we further propose the second algorithm that optimizes the
distributionally robust objective. Theoretical results demonstrate that both
algorithms have sub-linear Bayesian cumulative regret on the expectation
objective. Furthermore, we conduct numerical experiments to empirically
demonstrate the effectiveness of our algorithms.
( 2
min )
Policy gradient methods enjoy strong practical performance in numerous tasks
in reinforcement learning. Their theoretical understanding in multiagent
settings, however, remains limited, especially beyond two-player competitive
and potential Markov games. In this paper, we develop a new framework to
characterize optimistic policy gradient methods in multi-player Markov games
with a single controller. Specifically, under the further assumption that the
game exhibits an equilibrium collapse, in that the marginals of coarse
correlated equilibria (CCE) induce Nash equilibria (NE), we show convergence to
stationary $\epsilon$-NE in $O(1/\epsilon^2)$ iterations, where $O(\cdot)$
suppresses polynomial factors in the natural parameters of the game. Such an
equilibrium collapse is well-known to manifest itself in two-player zero-sum
Markov games, but also occurs even in a class of multi-player Markov games with
separable interactions, as established by recent work. As a result, we bypass
known complexity barriers for computing stationary NE when either of our
assumptions fails. Our approach relies on a natural generalization of the
classical Minty property that we introduce, which we anticipate to have further
applications beyond Markov games.
( 2
min )
Tabular data analysis is crucial in various fields, and large language models
show promise in this area. However, current research mostly focuses on
rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like
forecasting and chart generation. To address this gap, we developed the
Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond
the SQL-compatible operations and require more in-depth analysis. We also
develop five innovative and effective annotation methods, harnessing the
capabilities of large language models to enhance data quality and quantity.
Additionally, we include unclear queries that resemble real-world user
questions to test how well models can understand and tackle such challenges.
Finally, we collect 2249 query-result pairs with 347 tables. We evaluate five
state-of-the-art models using three different metrics and the results show that
our benchmark presents introduces considerable challenge in the field of
tabular data analysis, paving the way for more advanced research opportunities.
( 2
min )
Recently multi-armed bandit problem arises in many real-life scenarios where
arms must be sampled in batches, due to limited time the agent can wait for the
feedback. Such applications include biological experimentation and online
marketing. The problem is further complicated when the number of arms is large
and the number of batches is small. We consider pure exploration in a batched
multi-armed bandit problem. We introduce a general linear programming framework
that can incorporate objectives of different theoretical settings in best arm
identification. The linear program leads to a two-stage algorithm that can
achieve good theoretical properties. We demonstrate by numerical studies that
the algorithm also has good performance compared to certain UCB-type or
Thompson sampling methods.
( 2
min )
Uncertainty estimation is a key issue when considering the application of
deep neural network methods in science and engineering. In this work, we
introduce a novel algorithm that quantifies epistemic uncertainty via Monte
Carlo sampling from a tempered posterior distribution. It combines the well
established Metropolis Adjusted Langevin Algorithm (MALA) with momentum-based
optimization using Adam and leverages a prolate proposal distribution, to
efficiently draw from the posterior. We prove that the constructed chain admits
the Gibbs posterior as an invariant distribution and converges to this Gibbs
posterior in total variation distance. Numerical evaluations are postponed to a
first revision.
( 2
min )
Pufferfish privacy is a flexible generalization of differential privacy that
allows to model arbitrary secrets and adversary's prior knowledge about the
data. Unfortunately, designing general and tractable Pufferfish mechanisms that
do not compromise utility is challenging. Furthermore, this framework does not
provide the composition guarantees needed for a direct use in iterative machine
learning algorithms. To mitigate these issues, we introduce a R\'enyi
divergence-based variant of Pufferfish and show that it allows us to extend the
applicability of the Pufferfish framework. We first generalize the Wasserstein
mechanism to cover a wide range of noise distributions and introduce several
ways to improve its utility. We also derive stronger guarantees against
out-of-distribution adversaries. Finally, as an alternative to composition, we
prove privacy amplification results for contractive noisy iterations and
showcase the first use of Pufferfish in private convex optimization. A common
ingredient underlying our results is the use and extension of shift reduction
lemmas.
( 2
min )
Convex clustering is a modern method with both hierarchical and $k$-means
clustering characteristics. Although convex clustering can capture complex
clustering structures hidden in data, the existing convex clustering algorithms
are not scalable to large data sets with sample sizes greater than several
thousands. Moreover, it is known that convex clustering sometimes fails to
produce a complete hierarchical clustering structure. This issue arises if
clusters split up or the minimum number of possible clusters is larger than the
desired number of clusters. In this paper, we propose convex clustering through
majorization-minimization (CCMM) -- an iterative algorithm that uses cluster
fusions and a highly efficient updating scheme derived using diagonal
majorization. Additionally, we explore different strategies to ensure that the
hierarchical clustering structure terminates in a single cluster. With a
current desktop computer, CCMM efficiently solves convex clustering problems
featuring over one million objects in seven-dimensional space, achieving a
solution time of 51 seconds on average.
( 2
min )
This short note describes and proves a connectedness property which was
introduced in Blocher et al. [2023] in the context of data depth functions for
partial orders. The connectedness property gives a structural insight into
union-free generic sets. These sets, presented in Blocher et al. [2023], are
defined by using a closure operator on the set of all partial orders which
naturally appears within the theory of formal concept analysis. In the language
of formal concept analysis, the property of connectedness can be vividly
proven. However, since within Blocher et al. [2023] we did not discuss formal
concept analysis, we outsourced the proof to this note.
( 2
min )
This paper considers the epistemic justification for a simplicity preference
in inductive inference that may be obtained from the machine learning framework
of statistical learning theory. Uniting elements from both earlier arguments
suggesting and rejecting such a justification, the paper spells out a qualified
means-ends and model-relative justificatory argument, built on statistical
learning theory's central mathematical learning guarantee for the method of
empirical risk minimization.
( 2
min )
Generating counterfactual explanations is one of the most effective
approaches for uncovering the inner workings of black-box neural network models
and building user trust. While remarkable strides have been made in generative
modeling using diffusion models in domains like vision, their utility in
generating counterfactual explanations in structured modalities remains
unexplored. In this paper, we introduce Structured Counterfactual Diffuser or
SCD, the first plug-and-play framework leveraging diffusion for generating
counterfactual explanations in structured data. SCD learns the underlying data
distribution via a diffusion model which is then guided at test time to
generate counterfactuals for any arbitrary black-box model, input, and desired
prediction. Our experiments show that our counterfactuals not only exhibit high
plausibility compared to the existing state-of-the-art but also show
significantly better proximity and diversity.
( 2
min )
Recent works have shown that physics-inspired architectures allow the
training of deep graph neural networks (GNNs) without oversmoothing. The role
of these physics is unclear, however, with successful examples of both
reversible (e.g., Hamiltonian) and irreversible (e.g., diffusion) phenomena
producing comparable results despite diametrically opposed mechanisms, and
further complications arising due to empirical departures from mathematical
theory. This work presents a series of novel GNN architectures based upon
structure-preserving bracket-based dynamical systems, which are provably
guaranteed to either conserve energy or generate positive dissipation with
increasing depth. It is shown that the theoretically principled framework
employed here allows for inherently explainable constructions, which
contextualize departures from theory in current architectures and better
elucidate the roles of reversibility and irreversibility in network
performance.
( 2
min )
Accurate land use maps, describing the territory from an anthropic
utilisation point of view, are useful tools for land management and planning.
To produce them, the use of optical images alone remains limited. It is
therefore necessary to make use of several heterogeneous sources, each carrying
complementary or contradictory information due to their imperfections or their
different specifications. This study compares two different approaches i.e. a
pre-classification and a post-classification fusion approach for combining
several sources of spatial data in the context of land use classification. The
approaches are applied on authoritative land use data located in the Gers
department in the southwest of France. Pre-classification fusion, while not
explicitly modeling imperfections, has the best final results, reaching an
overall accuracy of 97% and a macro-mean F1 score of 88%.
( 2
min )
Existing algorithms for reinforcement learning from human feedback (RLHF) can
incentivize responses at odds with preferences because they are based on models
that assume independence of irrelevant alternatives (IIA). The perverse
incentives induced by IIA give rise to egregious behavior when innovating on
query formats or learning algorithms.
( 2
min )
We tackle the problem of sampling from intractable high-dimensional density
functions, a fundamental task that often appears in machine learning and
statistics. We extend recent sampling-based approaches that leverage controlled
stochastic processes to model approximate samples from these target densities.
The main drawback of these approaches is that the training objective requires
full trajectories to compute, resulting in sluggish credit assignment issues
due to use of entire trajectories and a learning signal present only at the
terminal time. In this work, we present Diffusion Generative Flow Samplers
(DGFS), a sampling-based framework where the learning process can be tractably
broken down into short partial trajectory segments, via parameterizing an
additional "flow function". Our method takes inspiration from the theory
developed for generative flow networks (GFlowNets), allowing us to make use of
intermediate learning signals. Through various challenging experiments, we
demonstrate that DGFS achieves more accurate estimates of the normalization
constant than closely-related prior methods.
( 2
min )
Through this paper, we introduce a novel driver cognitive load assessment
dataset, CL-Drive, which contains Electroencephalogram (EEG) signals along with
other physiological signals such as Electrocardiography (ECG) and Electrodermal
Activity (EDA) as well as eye tracking data. The data was collected from 21
subjects while driving in an immersive vehicle simulator, in various driving
conditions, to induce different levels of cognitive load in the subjects. The
tasks consisted of 9 complexity levels for 3 minutes each. Each driver reported
their subjective cognitive load every 10 seconds throughout the experiment. The
dataset contains the subjective cognitive load recorded as ground truth. In
this paper, we also provide benchmark classification results for different
machine learning and deep learning models for both binary and ternary label
distributions. We followed 2 evaluation criteria namely 10-fold and
leave-one-subject-out (LOSO). We have trained our models on both hand-crafted
features as well as on raw data.
( 3
min )
The effectiveness of digital treatments can be measured by requiring patients
to self-report their state through applications, however, it can be
overwhelming and causes disengagement. We conduct a study to explore the impact
of gamification on self-reporting. Our approach involves the creation of a
system to assess cognitive load (CL) through the analysis of
photoplethysmography (PPG) signals. The data from 11 participants is utilized
to train a machine learning model to detect CL. Subsequently, we create two
versions of surveys: a gamified and a traditional one. We estimate the CL
experienced by other participants (13) while completing surveys. We find that
CL detector performance can be enhanced via pre-training on stress detection
tasks. For 10 out of 13 participants, a personalized CL detector can achieve an
F1 score above 0.7. We find no difference between the gamified and non-gamified
surveys in terms of CL but participants prefer the gamified version.
( 3
min )
Multi-relational clustering is a challenging task due to the fact that
diverse semantic information conveyed in multi-layer graphs is difficult to
extract and fuse. Recent methods integrate topology structure and node
attribute information through graph filtering. However, they often use a
low-pass filter without fully considering the correlation among multiple
graphs. To overcome this drawback, we propose to learn a graph filter motivated
by the theoretical analysis of Barlow Twins. We find that input with a negative
semi-definite inner product provides a lower bound for Barlow Twins loss, which
prevents it from reaching a better solution. We thus learn a filter that yields
an upper bound for Barlow Twins. Afterward, we design a simple clustering
architecture and demonstrate its state-of-the-art performance on four benchmark
datasets.
( 2
min )
Stochastic optimal control of dynamical systems is a crucial challenge in
sequential decision-making. Recently, control-as-inference approaches have had
considerable success, providing a viable risk-sensitive framework to address
the exploration-exploitation dilemma. Nonetheless, a majority of these
techniques only invoke the inference-control duality to derive a modified risk
objective that is then addressed within a reinforcement learning framework.
This paper introduces a novel perspective by framing risk-sensitive stochastic
control as Markovian score climbing under samples drawn from a conditional
particle filter. Our approach, while purely inference-centric, provides
asymptotically unbiased estimates for gradient-based policy optimization with
optimal importance weighting and no explicit value function learning. To
validate our methodology, we apply it to the task of learning neural
non-Gaussian feedback policies, showcasing its efficacy on numerical benchmarks
of stochastic dynamical systems.
( 2
min )
Despite the great popularity of virtual screening of existing compound
libraries, the search for new potential drug candidates also takes advantage of
generative protocols, where new compound suggestions are enumerated using
various algorithms. To increase the activity potency of generative approaches,
they have recently been coupled with molecular docking, a leading methodology
of structure-based drug design. In this review, we summarize progress since
docking-based generative models emerged. We propose a new taxonomy for these
methods and discuss their importance for the field of computer-aided drug
design. In addition, we discuss the most promising directions for further
development of generative protocols coupled with docking.
( 2
min )
We study convergence rates of loss and uncertainty-based active learning
algorithms under various assumptions. First, we provide a set of conditions
under which a convergence rate guarantee holds, and use this for linear
classifiers and linearly separable datasets to show convergence rate guarantees
for loss-based sampling and different loss functions. Second, we provide a
framework that allows us to derive convergence rate bounds for loss-based
sampling by deploying known convergence rate bounds for stochastic gradient
descent algorithms. Third, and last, we propose an active learning algorithm
that combines sampling of points and stochastic Polyak's step size. We show a
condition on the sampling that ensures a convergence rate guarantee for this
algorithm for smooth convex loss functions. Our numerical results demonstrate
efficiency of our proposed algorithm.
( 2
min )
Industrial robots are applied in a widening range of industries, but robot
programming mostly remains a task limited to programming experts. We propose a
natural language-based assistant for programming of advanced, industrial
robotic applications and investigate strategies for domain-specific fine-tuning
of foundation models with limited data and compute.
( 2
min )
We propose to train neural networks (NNs) using a novel variant of the
``Additively Preconditioned Trust-region Strategy'' (APTS). The proposed method
is based on a parallelizable additive domain decomposition approach applied to
the neural network's parameters. Built upon the TR framework, the APTS method
ensures global convergence towards a minimizer. Moreover, it eliminates the
need for computationally expensive hyper-parameter tuning, as the TR algorithm
automatically determines the step size in each iteration. We demonstrate the
capabilities, strengths, and limitations of the proposed APTS training method
by performing a series of numerical experiments. The presented numerical study
includes a comparison with widely used training methods such as SGD, Adam,
LBFGS, and the standard TR method.
( 2
min )
Transformer-based Large Language Models (LLMs) have become a fixture in
modern machine learning. Correspondingly, significant resources are allocated
towards research that aims to further advance this technology, typically
resulting in models of increasing size that are trained on increasing amounts
of data. This work, however, demonstrates the surprising result that it is
often possible to significantly improve the performance of LLMs by selectively
removing higher-order components of their weight matrices. This simple
intervention, which we call LAyer-SElective Rank reduction (LASER), can be done
on a model after training has completed, and requires no additional parameters
or data. We show extensive experiments demonstrating the generality of this
finding across language models and datasets, and provide in-depth analyses
offering insights into both when LASER is effective and the mechanism by which
it operates.
( 2
min )
InvertibleNetworks.jl is a Julia package designed for the scalable
implementation of normalizing flows, a method for density estimation and
sampling in high-dimensional distributions. This package excels in memory
efficiency by leveraging the inherent invertibility of normalizing flows, which
significantly reduces memory requirements during backpropagation compared to
existing normalizing flow packages that rely on automatic differentiation
frameworks. InvertibleNetworks.jl has been adapted for diverse applications,
including seismic imaging, medical imaging, and CO2 monitoring, demonstrating
its effectiveness in learning high-dimensional distributions.
( 2
min )
Utilizing task-invariant prior knowledge extracted from related tasks,
meta-learning is a principled framework that empowers learning a new task
especially when data records are limited. A fundamental challenge in
meta-learning is how to quickly "adapt" the extracted prior in order to train a
task-specific model within a few optimization steps. Existing approaches deal
with this challenge using a preconditioner that enhances convergence of the
per-task training process. Though effective in representing locally a quadratic
training loss, these simple linear preconditioners can hardly capture complex
loss geometries. The present contribution addresses this limitation by learning
a nonlinear mirror map, which induces a versatile distance metric to enable
capturing and optimizing a wide range of loss geometries, hence facilitating
the per-task training. Numerical tests on few-shot learning datasets
demonstrate the superior expressiveness and convergence of the advocated
approach.
( 2
min )
Many inference scenarios rely on extracting relevant information from known
data in order to make future predictions. When the underlying stochastic
process satisfies certain assumptions, there is a direct mapping between its
exact classical and quantum simulators, with the latter asymptotically using
less memory. Here we focus on studying whether such quantum advantage persists
when those assumptions are not satisfied, and the model is doomed to have
imperfect accuracy. By studying the trade-off between accuracy and memory
requirements, we show that quantum models can reach the same accuracy with less
memory, or alternatively, better accuracy with the same memory. Finally, we
discuss the implications of this result for learning tasks.
( 2
min )
Physical based simulations can be very time and computationally demanding
tasks. One way of accelerating these processes is by making use of data-driven
surrogate models that learn from existing simulations. Ensembling methods are
particularly relevant in this domain as their smoothness properties coincide
with the smoothness of physical phenomena. The drawback is that they can remain
costly. This research project focused on studying Packed-Ensembles that
generalize Deep Ensembles but remain faster to train. Several models have been
trained and compared in terms of multiple important metrics. PE(8,4,1) has been
identified as the clear winner in this particular task, beating down its Deep
Ensemble conterpart while accelerating the training time by 25%.
( 2
min )
The generation of cold atom clouds is a complex process which involves the
optimization of noisy data in high dimensional parameter spaces. Optimization
can be challenging both in and especially outside of the lab due to lack of
time, expertise, or access for lengthy manual optimization. In recent years, it
was demonstrated that machine learning offers a solution since it can optimize
high dimensional problems quickly, without knowledge of the experiment itself.
In this paper we present results showing the benchmarking of nine different
optimization techniques and implementations, alongside their ability to
optimize a Rubidium (Rb) cold atom experiment. The investigations are performed
on a 3D $^{87}$Rb molasses with 10 and 18 adjustable parameters, respectively,
where the atom number obtained by absorption imaging was chosen as the test
problem. We further compare the best performing optimizers under different
effective noise conditions by reducing the Signal-to-Noise ratio of the images
via adapting the atomic vapor pressure in the 2D+ MOT and the detection laser
frequency stability.
( 2
min )
Federated bilevel optimization (FBO) has shown great potential recently in
machine learning and edge computing due to the emerging nested optimization
structure in meta-learning, fine-tuning, hyperparameter tuning, etc. However,
existing FBO algorithms often involve complicated computations and require
multiple sub-loops per iteration, each of which contains a number of
communication rounds. In this paper, we propose a simple and flexible FBO
framework named SimFBO, which is easy to implement without sub-loops, and
includes a generalized server-side aggregation and update for improving
communication efficiency. We further propose System-level heterogeneity robust
FBO (ShroFBO) as a variant of SimFBO with stronger resilience to heterogeneous
local computation. We show that SimFBO and ShroFBO provably achieve a linear
convergence speedup with partial client participation and client sampling
without replacement, as well as improved sample and communication complexities.
Experiments demonstrate the effectiveness of the proposed methods over existing
FBO algorithms.
( 2
min )
In this paper, we revisit the bilevel optimization problem, in which the
upper-level objective function is generally nonconvex and the lower-level
objective function is strongly convex. Although this type of problem has been
studied extensively, it still remains an open question how to achieve an
${O}(\epsilon^{-1.5})$ sample complexity in Hessian/Jacobian-free stochastic
bilevel optimization without any second-order derivative computation. To fill
this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named
FdeHBO, which features a simple fully single-loop structure, a projection-aided
finite-difference Hessian/Jacobian-vector approximation, and momentum-based
updates. Theoretically, we show that FdeHBO requires ${O}(\epsilon^{-1.5})$
iterations (each using ${O}(1)$ samples and only first-order gradient
information) to find an $\epsilon$-accurate stationary point. As far as we
know, this is the first Hessian/Jacobian-free method with an
${O}(\epsilon^{-1.5})$ sample complexity for nonconvex-strongly-convex
stochastic bilevel optimization.
( 2
min )
We tackle the problem of sampling from intractable high-dimensional density
functions, a fundamental task that often appears in machine learning and
statistics. We extend recent sampling-based approaches that leverage controlled
stochastic processes to model approximate samples from these target densities.
The main drawback of these approaches is that the training objective requires
full trajectories to compute, resulting in sluggish credit assignment issues
due to use of entire trajectories and a learning signal present only at the
terminal time. In this work, we present Diffusion Generative Flow Samplers
(DGFS), a sampling-based framework where the learning process can be tractably
broken down into short partial trajectory segments, via parameterizing an
additional "flow function". Our method takes inspiration from the theory
developed for generative flow networks (GFlowNets), allowing us to make use of
intermediate learning signals. Through various challenging experiments, we
demonstrate that DGFS achieves more accurate estimates of the normalization
constant than closely-related prior methods.
( 2
min )
Uncertainty estimation is a key issue when considering the application of
deep neural network methods in science and engineering. In this work, we
introduce a novel algorithm that quantifies epistemic uncertainty via Monte
Carlo sampling from a tempered posterior distribution. It combines the well
established Metropolis Adjusted Langevin Algorithm (MALA) with momentum-based
optimization using Adam and leverages a prolate proposal distribution, to
efficiently draw from the posterior. We prove that the constructed chain admits
the Gibbs posterior as an invariant distribution and converges to this Gibbs
posterior in total variation distance. Numerical evaluations are postponed to a
first revision.
( 2
min )
Recently multi-armed bandit problem arises in many real-life scenarios where
arms must be sampled in batches, due to limited time the agent can wait for the
feedback. Such applications include biological experimentation and online
marketing. The problem is further complicated when the number of arms is large
and the number of batches is small. We consider pure exploration in a batched
multi-armed bandit problem. We introduce a general linear programming framework
that can incorporate objectives of different theoretical settings in best arm
identification. The linear program leads to a two-stage algorithm that can
achieve good theoretical properties. We demonstrate by numerical studies that
the algorithm also has good performance compared to certain UCB-type or
Thompson sampling methods.
( 2
min )
Convex clustering is a modern method with both hierarchical and $k$-means
clustering characteristics. Although convex clustering can capture complex
clustering structures hidden in data, the existing convex clustering algorithms
are not scalable to large data sets with sample sizes greater than several
thousands. Moreover, it is known that convex clustering sometimes fails to
produce a complete hierarchical clustering structure. This issue arises if
clusters split up or the minimum number of possible clusters is larger than the
desired number of clusters. In this paper, we propose convex clustering through
majorization-minimization (CCMM) -- an iterative algorithm that uses cluster
fusions and a highly efficient updating scheme derived using diagonal
majorization. Additionally, we explore different strategies to ensure that the
hierarchical clustering structure terminates in a single cluster. With a
current desktop computer, CCMM efficiently solves convex clustering problems
featuring over one million objects in seven-dimensional space, achieving a
solution time of 51 seconds on average.
( 2
min )
Pufferfish privacy is a flexible generalization of differential privacy that
allows to model arbitrary secrets and adversary's prior knowledge about the
data. Unfortunately, designing general and tractable Pufferfish mechanisms that
do not compromise utility is challenging. Furthermore, this framework does not
provide the composition guarantees needed for a direct use in iterative machine
learning algorithms. To mitigate these issues, we introduce a R\'enyi
divergence-based variant of Pufferfish and show that it allows us to extend the
applicability of the Pufferfish framework. We first generalize the Wasserstein
mechanism to cover a wide range of noise distributions and introduce several
ways to improve its utility. We also derive stronger guarantees against
out-of-distribution adversaries. Finally, as an alternative to composition, we
prove privacy amplification results for contractive noisy iterations and
showcase the first use of Pufferfish in private convex optimization. A common
ingredient underlying our results is the use and extension of shift reduction
lemmas.
( 2
min )
Large language model (LLM) training has surged in popularity over the last year with the release of several popular models such as Llama 2, Falcon, and Mistral. Customers are now pre-training and fine-tuning LLMs ranging from 1 billion to over 175 billion parameters to optimize model performance for applications across industries, from healthcare to finance […]
( 9
min )
Today, we are excited to announce that the Mixtral-8x7B large language model (LLM), developed by Mistral AI, is available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. The Mixtral-8x7B LLM is a pre-trained sparse mixture of expert model, based on a 7-billion parameter backbone with eight experts per feed-forward […]
( 11
min )
This blog is co-written with Josh Reini, Shayak Sen and Anupam Datta from TruEra Amazon SageMaker JumpStart provides a variety of pretrained foundation models such as Llama-2 and Mistal 7B that can be quickly deployed to an endpoint. These foundation models perform well with generative tasks, from crafting text and summaries, answering questions, to producing […]
( 12
min )
Generative AI agents are capable of producing human-like responses and engaging in natural language conversations by orchestrating a chain of calls to foundation models (FMs) and other augmenting tools based on user input. Instead of only fulfilling predefined intents through a static decision tree, agents are autonomous within the context of their suite of available […]
( 15
min )
As I completed this blog series, the European Union (EU) announced its AI Regulation Law. The European Union’s AI Regulation Act seeks to ensure AI’s ethical and safe deployment in the EU. Coming on the heels of the White House’s “Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence,” we… Read More »Creating a More Fair, Just, and Prosperous Brave New World with AI Summary
The post Creating a More Fair, Just, and Prosperous Brave New World with AI Summary appeared first on Data Science Central.
( 21
min )
Master's students Irene Terpstra ’23 and Rujul Gandhi ’22 use language to design new integrated circuits and make it understandable to robots.
( 9
min )
AI saw unparalleled growth in 2023, reaching millions daily. This progress owes much to the extensive work of Microsoft researchers and collaborators. In this review, learn about the advances in 2023, which set the stage for further progress in 2024.
The post Research at Microsoft 2023: A year of groundbreaking AI advances and discoveries appeared first on Microsoft Research.
( 17
min )
Quantization replaces floating point arithmetic with integer arithmetic in
deep neural network models, providing more efficient on-device inference with
less power and memory. In this work, we propose a framework for formally
verifying properties of quantized neural networks. Our baseline technique is
based on integer linear programming which guarantees both soundness and
completeness. We then show how efficiency can be improved by utilizing
gradient-based heuristic search methods and also bound-propagation techniques.
We evaluate our approach on perception networks quantized with PyTorch. Our
results show that we can verify quantized networks with better scalability and
efficiency than the previous state of the art.
( 2
min )
Deep generative models, such as diffusion models, GANs, and IMLE, have shown
impressive capability in tackling inverse problems. However, the validity of
model-generated solutions w.r.t. the forward problem and the reliability of
associated uncertainty estimates remain understudied. This study evaluates
recent diffusion-based, GAN-based, and IMLE-based methods on three inverse
problems, i.e., $16\times$ super-resolution, colourization, and image
decompression. We assess the validity of these models' outputs as solutions to
the inverse problems and conduct a thorough analysis of the reliability of the
models' estimates of uncertainty over the solution. Overall, we find that the
IMLE-based CHIMLE method outperforms other methods in terms of producing valid
solutions and reliable uncertainty estimates.
( 2
min )
The selection of the assumed effect size (AES) critically determines the
duration of an experiment, and hence its accuracy and efficiency.
Traditionally, experimenters determine AES based on domain knowledge. However,
this method becomes impractical for online experimentation services managing
numerous experiments, and a more automated approach is hence of great demand.
We initiate the study of data-driven AES selection in for online
experimentation services by introducing two solutions. The first employs a
three-layer Gaussian Mixture Model considering the heteroskedasticity across
experiments, and it seeks to estimate the true expected effect size among
positive experiments. The second method, grounded in utility theory, aims to
determine the optimal effect size by striking a balance between the
experiment's cost and the precision of decision-making. Through comparisons
with baseline methods using both simulated and real data, we showcase the
superior performance of the proposed approaches.
( 2
min )
Fusing measurements from multiple, heterogeneous, partial sources, observing
a common object or process, poses challenges due to the increasing availability
of numbers and types of sensors. In this work we propose, implement and
validate an end-to-end computational pipeline in the form of a
multiple-auto-encoder neural network architecture for this task. The inputs to
the pipeline are several sets of partial observations, and the result is a
globally consistent latent space, harmonizing (rigidifying, fusing) all
measurements. The key enabler is the availability of multiple slightly
perturbed measurements of each instance:, local measurement, "bursts", that
allows us to estimate the local distortion induced by each instrument. We
demonstrate the approach in a sequence of examples, starting with simple
two-dimensional data sets and proceeding to a Wi-Fi localization problem and to
the solution of a "dynamical puzzle" arising in spatio-temporal observations of
the solutions of Partial Differential Equations.
( 2
min )
We introduce Efficient Title Reranker via Broadcasting Query Encoder, a novel
title reranking technique to achieve efficient title reranking 20x-40x faster
than vanilla passage reranker. However, one of the challenges with the training
of Efficient Title Reranker is the instability. Analyzing the issue, we found
some very difficult ground truths might act as noisy labels causing accuracy to
drop as well as some extreme values in model probability output causing nan. To
address these issues, we introduce the Sigmoid Trick, a novel technique that
reduces the gradient update of both cases resulting in better retrieval
efficacy. Experiments showed the effectiveness of ETR and sigmoid trick as we
achieved four state-of-the-art positions on the kilt knowledge benchmark.
( 2
min )
We present a novel approach to non-convex optimization with certificates,
which handles smooth functions on the hypercube or on the torus. Unlike
traditional methods that rely on algebraic properties, our algorithm exploits
the regularity of the target function intrinsic in the decay of its Fourier
spectrum. By defining a tractable family of models, we allow at the same time
to obtain precise certificates and to leverage the advanced and powerful
computational techniques developed to optimize neural networks. In this way the
scalability of our approach is naturally enhanced by parallel computing with
GPUs. Our approach, when applied to the case of polynomials of moderate
dimensions but with thousands of coefficients, outperforms the state-of-the-art
optimization methods with certificates, as the ones based on Lasserre's
hierarchy, addressing problems intractable for the competitors.
( 2
min )
Neural networks are powerful tools in various applications, and quantifying
their uncertainty is crucial for reliable decision-making. In the deep learning
field, the uncertainties are usually categorized into aleatoric (data) and
epistemic (model) uncertainty. In this paper, we point out that the existing
popular variance attenuation method highly overestimates aleatoric uncertainty.
To address this issue, we propose a new estimation method by actively
de-noising the observed data. By conducting a broad range of experiments, we
demonstrate that our proposed approach provides a much closer approximation to
the actual data uncertainty than the standard method.
( 2
min )
Generative Adversarial Networks (GANs) have become a ubiquitous technology
for data generation, with their prowess in image generation being
well-established. However, their application in generating tabular data has
been less than ideal. Furthermore, attempting to incorporate differential
privacy technology into these frameworks has often resulted in a degradation of
data utility. To tackle these challenges, this paper introduces DP-SACTGAN, a
novel Conditional Generative Adversarial Network (CGAN) framework for
differentially private tabular data generation, aiming to surmount these
obstacles. Experimental findings demonstrate that DP-SACTGAN not only
accurately models the distribution of the original data but also effectively
satisfies the requirements of differential privacy.
( 2
min )
Measurement-based quantum computation (MBQC) is a paradigm for quantum
computation where computation is driven by local measurements on a suitably
entangled resource state. In this work we show that MBQC is related to a model
of quantum computation based on Clifford quantum cellular automata (CQCA).
Specifically, we show that certain MBQCs can be directly constructed from CQCAs
which yields a simple and intuitive circuit model representation of MBQC in
terms of quantum computation based on CQCA. We apply this description to
construct various MBQC-based Ans\"atze for parameterized quantum circuits,
demonstrating that the different Ans\"atze may lead to significantly different
performances on different learning tasks. In this way, MBQC yields a family of
Hardware-efficient Ans\"atze that may be adapted to specific problem settings
and is particularly well suited for architectures with translationally
invariant gates such as neutral atoms.
( 2
min )
External control arms (ECA) can inform the early clinical development of
experimental drugs and provide efficacy evidence for regulatory approval in
non-randomized settings. However, the main challenge of implementing ECA lies
in accessing real-world data or historical clinical trials. Indeed, data
sharing is often not feasible due to privacy considerations related to data
leaving the original collection centers, along with pharmaceutical companies'
competitive motives. In this paper, we leverage a privacy-enhancing technology
called federated learning (FL) to remove some of the barriers to data sharing.
We introduce a federated learning inverse probability of treatment weighted
(IPTW) method for time-to-event outcomes called FedECA which eases the
implementation of ECA by limiting patients' data exposure. We show with
extensive experiments that FedECA outperforms its closest competitor,
matching-adjusted indirect comparison (MAIC), in terms of statistical power and
ability to balance the treatment and control groups. To encourage the use of
such methods, we publicly release our code which relies on Substra, an
open-source FL software with proven experience in privacy-sensitive contexts.
( 3
min )
Text segmentation, the task of dividing a document into sections, is often a
prerequisite for performing additional natural language processing tasks.
Existing text segmentation methods have typically been developed and tested
using clean, narrative-style text with segments containing distinct topics.
Here we consider a challenging text segmentation task: dividing newspaper
marriage announcement lists into units of one announcement each. In many cases
the information is not structured into sentences, and adjacent segments are not
topically distinct from each other. In addition, the text of the announcements,
which is derived from images of historical newspapers via optical character
recognition, contains many typographical errors. As a result, these
announcements are not amenable to segmentation with existing techniques. We
present a novel deep learning-based model for segmenting such text and show
that it significantly outperforms an existing state-of-the-art method on our
task.
( 2
min )
We propose a novel machine learning method for sampling from the
high-dimensional probability distributions of Lattice Field Theories, which is
based on a single neural ODE layer and incorporates the full symmetries of the
problem. We test our model on the $\phi^4$ theory, showing that it
systematically outperforms previously proposed flow-based methods in sampling
efficiency, and the improvement is especially pronounced for larger lattices.
Furthermore, we demonstrate that our model can learn a continuous family of
theories at once, and the results of learning can be transferred to larger
lattices. Such generalizations further accentuate the advantages of machine
learning methods.
( 2
min )
Nowadays neural-network-based image- and video-quality metrics show better
performance compared to traditional methods. However, they also became more
vulnerable to adversarial attacks that increase metrics' scores without
improving visual quality. The existing benchmarks of quality metrics compare
their performance in terms of correlation with subjective quality and
calculation time. However, the adversarial robustness of image-quality metrics
is also an area worth researching. In this paper, we analyse modern metrics'
robustness to different adversarial attacks. We adopted adversarial attacks
from computer vision tasks and compared attacks' efficiency against 15
no-reference image/video-quality metrics. Some metrics showed high resistance
to adversarial attacks which makes their usage in benchmarks safer than
vulnerable metrics. The benchmark accepts new metrics submissions for
researchers who want to make their metrics more robust to attacks or to find
such metrics for their needs. Try our benchmark using pip install
robustness-benchmark.
( 2
min )
We propose to learn non-convex regularizers with a prescribed upper bound on
their weak-convexity modulus. Such regularizers give rise to variational
denoisers that minimize a convex energy. They rely on few parameters (less than
15,000) and offer a signal-processing interpretation as they mimic handcrafted
sparsity-promoting regularizers. Through numerical experiments, we show that
such denoisers outperform convex-regularization methods as well as the popular
BM3D denoiser. Additionally, the learned regularizer can be deployed to solve
inverse problems with iterative schemes that provably converge. For both CT and
MRI reconstruction, the regularizer generalizes well and offers an excellent
tradeoff between performance, number of parameters, guarantees, and
interpretability when compared to other data-driven approaches.
( 2
min )
Recent studies show that deep reinforcement learning (DRL) agents tend to
overfit to the task on which they were trained and fail to adapt to minor
environment changes. To expedite learning when transferring to unseen tasks, we
propose a novel approach to representing the current task using reward machines
(RMs), state machine abstractions that induce subtasks based on the current
task's rewards and dynamics. Our method provides agents with symbolic
representations of optimal transitions from their current abstract state and
rewards them for achieving these transitions. These representations are shared
across tasks, allowing agents to exploit knowledge of previously encountered
symbols and transitions, thus enhancing transfer. Empirical results show that
our representations improve sample efficiency and few-shot transfer in a
variety of domains.
( 2
min )
We propose a simple and general framework for nonparametric estimation of
heterogeneous treatment effects under fairness constraints. Under standard
regularity conditions, we show that the resulting estimators possess the double
robustness property. We use this framework to characterize the trade-off
between fairness and the maximum welfare achievable by the optimal policy. We
evaluate the methods in a simulation study and illustrate them in a real-world
case study.
( 2
min )
The recent popularity of text-to-image diffusion models (DM) can largely be
attributed to the intuitive interface they provide to users. The intended
generation can be expressed in natural language, with the model producing
faithful interpretations of text prompts. However, expressing complex or
nuanced ideas in text alone can be difficult. To ease image generation, we
propose MultiFusion that allows one to express complex and nuanced concepts
with arbitrarily interleaved inputs of multiple modalities and languages.
MutliFusion leverages pre-trained models and aligns them for integration into a
cohesive system, thereby avoiding the need for extensive training from scratch.
Our experimental results demonstrate the efficient transfer of capabilities
from individual modules to the downstream model. Specifically, the fusion of
all independent components allows the image generation module to utilize
multilingual, interleaved multimodal inputs despite being trained solely on
monomodal data in a single language.
( 2
min )
Focusing on stochastic programming (SP) with covariate information, this
paper proposes an empirical risk minimization (ERM) method embedded within a
nonconvex piecewise affine decision rule (PADR), which aims to learn the direct
mapping from features to optimal decisions. We establish the nonasymptotic
consistency result of our PADR-based ERM model for unconstrained problems and
asymptotic consistency result for constrained ones. To solve the nonconvex and
nondifferentiable ERM problem, we develop an enhanced stochastic
majorization-minimization algorithm and establish the asymptotic convergence to
(composite strong) directional stationarity along with complexity analysis. We
show that the proposed PADR-based ERM method applies to a broad class of
nonconvex SP problems with theoretical consistency guarantees and computational
tractability. Our numerical study demonstrates the superior performance of
PADR-based ERM methods compared to state-of-the-art approaches under various
settings, with significantly lower costs, less computation time, and robustness
to feature dimensions and nonlinearity of the underlying dependency.
( 2
min )
Time Series Classification and Extrinsic Regression are important and
challenging machine learning tasks. Deep learning has revolutionized natural
language processing and computer vision and holds great promise in other fields
such as time series analysis where the relevant features must often be
abstracted from the raw data but are not known a priori. This paper surveys the
current state of the art in the fast-moving field of deep learning for time
series classification and extrinsic regression. We review different network
architectures and training methods used for these tasks and discuss the
challenges and opportunities when applying deep learning to time series data.
We also summarize two critical applications of time series classification and
extrinsic regression, human activity recognition and satellite earth
observation.
( 2
min )
To mitigate global warming, greenhouse gas sources need to be resolved at a
high spatial resolution and monitored in time to ensure the reduction and
ultimately elimination of the pollution source. However, the complexity of
computation in resolving high-resolution wind fields left the simulations
impractical to test different time lengths and model configurations. This study
presents a preliminary development of a physics-informed super-resolution (SR)
generative adversarial network (GAN) that super-resolves the three-dimensional
(3D) low-resolution wind fields by upscaling x9 times. We develop a pixel-wise
self-attention (PWA) module that learns 3D weather dynamics via a
self-attention computation followed by a 2D convolution. We also employ a loss
term that regularizes the self-attention map during pretraining, capturing the
vertical convection process from input wind data. The new PWA SR-GAN shows the
high-fidelity super-resolved 3D wind data, learns a wind structure at the
high-frequency domain, and reduces the computational cost of a high-resolution
wind simulation by x89.7 times.
( 2
min )
This paper introduces Structured Noise Space GAN (SNS-GAN), a novel approach
in the field of generative modeling specifically tailored for class-conditional
generation in both image and time series data. It addresses the challenge of
effectively integrating class labels into generative models without requiring
structural modifications to the network. The SNS-GAN method embeds class
conditions within the generator's noise space, simplifying the training process
and enhancing model versatility. The model's efficacy is demonstrated through
qualitative validations in the image domain and superior performance in time
series generation compared to baseline models. This research opens new avenues
for the application of GANs in various domains, including but not limited to
time series and image data generation.
( 2
min )
In this work, we study the problem of stability of Graph Convolutional Neural
Networks (GCNs) under random small perturbations in the underlying graph
topology, i.e. under a limited number of insertions or deletions of edges. We
derive a novel bound on the expected difference between the outputs of
unperturbed and perturbed GCNs. The proposed bound explicitly depends on the
magnitude of the perturbation of the eigenpairs of the Laplacian matrix, and
the perturbation explicitly depends on which edges are inserted or deleted.
Then, we provide a quantitative characterization of the effect of perturbing
specific edges on the stability of the network. We leverage tools from small
perturbation analysis to express the bounds in closed, albeit approximate,
form, in order to enhance interpretability of the results, without the need to
compute any perturbed shift operator. Finally, we numerically evaluate the
effectiveness of the proposed bound.
( 2
min )
We propose an energy-efficient equalizer for IM/DD systems based on spiking
neural networks. We optimize a neural spike encoding that boosts the
equalizer's performance while decreasing energy consumption.
( 2
min )
Deep reinforcement learning has advanced greatly and applied in many areas.
In this paper, we explore the vulnerability of deep reinforcement learning by
proposing a novel generative model for creating effective adversarial examples
to attack the agent. Our proposed model can achieve both targeted attacks and
untargeted attacks. Considering the specificity of deep reinforcement learning,
we propose the action consistency ratio as a measure of stealthiness, and a new
measurement index of effectiveness and stealthiness. Experiment results show
that our method can ensure the effectiveness and stealthiness of attack
compared with other algorithms. Moreover, our methods are considerably faster
and thus can achieve rapid and efficient verification of the vulnerability of
deep reinforcement learning.
( 2
min )
Motivated by the interpretability question in ML models as a crucial element
for the successful deployment of AI systems, this paper focuses on rule
extraction as a means for neural networks interpretability. Through a
systematic literature review, different approaches for extracting rules from
feedforward neural networks, an important block in deep learning models, are
identified and explored. The findings reveal a range of methods developed for
over two decades, mostly suitable for shallow neural networks, with recent
developments to meet deep learning models' challenges. Rules offer a
transparent and intuitive means of explaining neural networks, making this
study a comprehensive introduction for researchers interested in the field.
While the study specifically addresses feedforward networks with supervised
learning and crisp rules, future work can extend to other network types,
machine learning methods, and fuzzy rule extraction.
( 2
min )
Exponential families are statistical models which are the workhorses in
statistics, information theory, and machine learning. An exponential family can
either be normalized subtractively by its cumulant function or equivalently
normalized divisively by its partition function. Both subtractive and divisive
normalizers are strictly convex and smooth functions inducing pairs of Bregman
and Jensen divergences. It is well-known that skewed Bhattacharryya distances
between probability densities of an exponential family amounts to skewed Jensen
divergences induced by the cumulant function between their corresponding
natural parameters, and in limit cases that the sided Kullback-Leibler
divergences amount to reverse-sided Bregman divergences. In this note, we first
show that the $\alpha$-divergences between unnormalized densities of an
exponential family amounts scaled $\alpha$-skewed Jensen divergences induced by
the partition function. We then show how comparative convexity with respect to
a pair of quasi-arithmetic means allows to deform convex functions and define
dually flat spaces with corresponding divergences when ordinary convexity is
preserved.
( 2
min )
This paper studies bandit problems where an agent has access to offline data
that might be utilized to potentially improve the estimation of each arm's
reward distribution. A major obstacle in this setting is the existence of
compound biases from the observational data. Ignoring these biases and blindly
fitting a model with the biased data could even negatively affect the online
learning phase. In this work, we formulate this problem from a causal
perspective. First, we categorize the biases into confounding bias and
selection bias based on the causal structure they imply. Next, we extract the
causal bound for each arm that is robust towards compound biases from biased
observational data. The derived bounds contain the ground truth mean reward and
can effectively guide the bandit agent to learn a nearly-optimal decision
policy. We also conduct regret analysis in both contextual and non-contextual
bandit settings and show that prior causal bounds could help consistently
reduce the asymptotic regret.
( 2
min )
Graph clustering is a fundamental and challenging task in the field of graph
mining where the objective is to group the nodes into clusters taking into
consideration the topology of the graph. It has several applications in diverse
domains spanning social network analysis, recommender systems, computer vision,
and bioinformatics. In this work, we propose a novel method, DGCluster, which
primarily optimizes the modularity objective using graph neural networks and
scales linearly with the graph size. Our method does not require the number of
clusters to be specified as a part of the input and can also leverage the
availability of auxiliary node level information. We extensively test DGCluster
on several real-world datasets of varying sizes, across multiple popular
cluster quality metrics. Our approach consistently outperforms the
state-of-the-art methods, demonstrating significant performance gains in almost
all settings.
( 2
min )
Designing studies that apply causal discovery requires navigating many
researcher degrees of freedom. This complexity is exacerbated when the study
involves fMRI data. In this paper we (i) describe nine challenges that occur
when applying causal discovery to fMRI data, (ii) discuss the space of
decisions that need to be made, (iii) review how a recent case study made those
decisions, (iv) and identify existing gaps that could potentially be solved by
the development of new methods. Overall, causal discovery is a promising
approach for analyzing fMRI data, and multiple successful applications have
indicated that it is superior to traditional fMRI functional connectivity
methods, but current causal discovery methods for fMRI leave room for
improvement.
( 2
min )
Multi-fidelity Bayesian Optimisation (MFBO) has been shown to generally
converge faster than single-fidelity Bayesian Optimisation (SFBO) (Poloczek et
al. (2017)). Inspired by recent benchmark papers, we are investigating the
long-run behaviour of MFBO, based on observations in the literature that it
might under-perform in certain scenarios (Mikkola et al. (2023), Eggensperger
et al. (2021)). An under-performance of MBFO in the long-run could
significantly undermine its application to many research tasks, especially when
we are not able to identify when the under-performance begins. We create a
simple benchmark study, showcase empirical results and discuss scenarios and
possible reasons of under-performance.
( 2
min )
This work presents the PORTALS framework, which leverages surrogate modeling
and optimization techniques to enable the prediction of core plasma profiles
and performance with nonlinear gyrokinetic simulations at significantly reduced
cost, with no loss of accuracy. The efficiency of PORTALS is benchmarked
against standard methods, and its full potential is demonstrated on a unique,
simultaneous 5-channel (electron temperature, ion temperature, electron
density, impurity density and angular rotation) prediction of steady-state
profiles in a DIII-D ITER Similar Shape plasma with GPU-accelerated, nonlinear
CGYRO. This paper also provides general guidelines for accurate performance
predictions in burning plasmas and the impact of transport modeling in fusion
pilot plants studies.
( 2
min )
Fairness AI aims to detect and alleviate bias across the entire AI
development life cycle, encompassing data curation, modeling, evaluation, and
deployment-a pivotal aspect of ethical AI implementation. Addressing data bias,
particularly concerning sensitive attributes like gender and race, reweighting
samples proves efficient for fairness AI. This paper contributes a systematic
examination of reweighting samples for traditional machine learning (ML)
models, employing five models for binary classification on the Adult Income and
COMPUS datasets with various protected attributes. The study evaluates
prediction results using five fairness metrics, uncovering the nuanced and
model-specific nature of reweighting sample effectiveness in achieving fairness
in traditional ML models, as well as revealing the complexity of bias dynamics.
( 2
min )
Motivated by recent work on lifelong learning applications for language
models (LMs) of code, we introduce CodeLL, a lifelong learning dataset focused
on code changes. Our contribution addresses a notable research gap marked by
the absence of a long-term temporal dimension in existing code change datasets,
limiting their suitability in lifelong learning scenarios. In contrast, our
dataset aims to comprehensively capture code changes across the entire release
history of open-source software repositories. In this work, we introduce an
initial version of CodeLL, comprising 71 machine-learning-based projects mined
from Software Heritage. This dataset enables the extraction and in-depth
analysis of code changes spanning 2,483 releases at both the method and API
levels. CodeLL enables researchers studying the behaviour of LMs in lifelong
fine-tuning settings for learning code changes. Additionally, the dataset can
help studying data distribution shifts within software repositories and the
evolution of API usages over time.
( 2
min )
This paper explores the feasibility and performance of on-device large
language model (LLM) inference on various Apple iPhone models. Amidst the rapid
evolution of generative AI, on-device LLMs offer solutions to privacy,
security, and connectivity challenges inherent in cloud-based models.
Leveraging existing literature on running multi-billion parameter LLMs on
resource-limited devices, our study examines the thermal effects and
interaction speeds of a high-performing LLM across different smartphone
generations. We present real-world performance results, providing insights into
on-device inference capabilities.
( 2
min )
Neural construction models have shown promising performance for Vehicle
Routing Problems (VRPs) by adopting either the Autoregressive (AR) or
Non-Autoregressive (NAR) learning approach. While AR models produce
high-quality solutions, they generally have a high inference latency due to
their sequential generation nature. Conversely, NAR models generate solutions
in parallel with a low inference latency but generally exhibit inferior
performance. In this paper, we propose a generic Guided Non-Autoregressive
Knowledge Distillation (GNARKD) method to obtain high-performance NAR models
having a low inference latency. GNARKD removes the constraint of sequential
generation in AR models while preserving the learned pivotal components in the
network architecture to obtain the corresponding NAR models through knowledge
distillation. We evaluate GNARKD by applying it to three widely adopted AR
models to obtain NAR VRP solvers for both synthesized and real-world instances.
The experimental results demonstrate that GNARKD significantly reduces the
inference time (4-5 times faster) with acceptable performance drop (2-3\%). To
the best of our knowledge, this study is first-of-its-kind to obtain NAR VRP
solvers from AR ones through knowledge distillation.
( 2
min )
We present a study on the integration of Large Language Models (LLMs) in
tabular data classification, emphasizing an efficient framework. Building upon
existing work done in TabLLM (arXiv:2210.10723), we introduce three novel
serialization techniques, including the standout LaTeX serialization method.
This method significantly boosts the performance of LLMs in processing
domain-specific datasets, Our method stands out for its memory efficiency and
ability to fully utilize complex data structures. Through extensive
experimentation, including various serialization approaches like feature
combination and importance, we demonstrate our work's superiority in accuracy
and efficiency over traditional models.
( 2
min )
Drivers can sustain serious injuries in traffic accidents. In this study,
traffic crashes on Florida's Interstate-95 from 2016 to 2021 were gathered, and
several classification methods were used to estimate the severity of driver
injuries. In the feature selection method, logistic regression was applied. To
compare model performances, various model assessment matrices such as accuracy,
recall, and area under curve (AUC) were developed. The Adaboost algorithm
outperformed the others in terms of recall and AUC. SHAP values were also
generated to explain the classification model's results. This analytical study
can be used to examine factors that contribute to the severity of driver
injuries in crashes.
( 2
min )
This paper presents a novel approach for analysing EEG data from drivers in a
simulated driving test. We focused on the Hurst exponent, Shannon entropy, and
fractal dimension as markers of the nonlinear dynamics of the brain. The
results show significant trends: Shannon Entropy and Fractal Dimension exhibit
variations during driving condition transitions, whereas the Hurst exponent
reflects memory retention portraying learning patterns. These findings suggest
that the tools of Non-linear Dynamical (NLD) Theory as indicators of cognitive
state and driving memory changes for assessing driver performance and advancing
the understanding of non-linear dynamics of human cognition in the context of
driving and beyond. Our study reveals the potential of NLD tools to elucidate
brain state and system variances, enabling their integration into current Deep
Learning and Machine Learning models. This integration can extend beyond
driving applications and be harnessed for cognitive learning, thereby improving
overall productivity and accuracy levels.
( 2
min )
Focusing on stochastic programming (SP) with covariate information, this
paper proposes an empirical risk minimization (ERM) method embedded within a
nonconvex piecewise affine decision rule (PADR), which aims to learn the direct
mapping from features to optimal decisions. We establish the nonasymptotic
consistency result of our PADR-based ERM model for unconstrained problems and
asymptotic consistency result for constrained ones. To solve the nonconvex and
nondifferentiable ERM problem, we develop an enhanced stochastic
majorization-minimization algorithm and establish the asymptotic convergence to
(composite strong) directional stationarity along with complexity analysis. We
show that the proposed PADR-based ERM method applies to a broad class of
nonconvex SP problems with theoretical consistency guarantees and computational
tractability. Our numerical study demonstrates the superior performance of
PADR-based ERM methods compared to state-of-the-art approaches under various
settings, with significantly lower costs, less computation time, and robustness
to feature dimensions and nonlinearity of the underlying dependency.
( 2
min )
We propose a simple and general framework for nonparametric estimation of
heterogeneous treatment effects under fairness constraints. Under standard
regularity conditions, we show that the resulting estimators possess the double
robustness property. We use this framework to characterize the trade-off
between fairness and the maximum welfare achievable by the optimal policy. We
evaluate the methods in a simulation study and illustrate them in a real-world
case study.
( 2
min )
Researchers are increasingly turning to machine learning (ML) algorithms to
investigate causal heterogeneity in randomized experiments. Despite their
promise, ML algorithms may fail to accurately ascertain heterogeneous treatment
effects under practical settings with many covariates and small sample size. In
addition, the quantification of estimation uncertainty remains a challenge. We
develop a general approach to statistical inference for heterogeneous treatment
effects discovered by a generic ML algorithm. We apply the Neyman's repeated
sampling framework to a common setting, in which researchers use an ML
algorithm to estimate the conditional average treatment effect and then divide
the sample into several groups based on the magnitude of the estimated effects.
We show how to estimate the average treatment effect within each of these
groups, and construct a valid confidence interval. In addition, we develop
nonparametric tests of treatment effect homogeneity across groups, and
rank-consistency of within-group average treatment effects. The validity of our
methodology does not rely on the properties of ML algorithms because it is
solely based on the randomization of treatment assignment and random sampling
of units. Finally, we generalize our methodology to the cross-fitting procedure
by accounting for the additional uncertainty induced by the random splitting of
data.
( 3
min )
Recent advances in practical quantum computing have led to a variety of
cloud-based quantum computing platforms that allow researchers to evaluate
their algorithms on noisy intermediate-scale quantum (NISQ) devices. A common
property of quantum computers is that they can exhibit instances of true
randomness as opposed to pseudo-randomness obtained from classical systems.
Investigating the effects of such true quantum randomness in the context of
machine learning is appealing, and recent results vaguely suggest that benefits
can indeed be achieved from the use of quantum random numbers. To shed some
more light on this topic, we empirically study the effects of hardware-biased
quantum random numbers on the initialization of artificial neural network
weights in numerical experiments. We find no statistically significant
difference in comparison with unbiased quantum random numbers as well as biased
and unbiased random numbers from a classical pseudo-random number generator.
The quantum random numbers for our experiments are obtained from real quantum
hardware.
( 2
min )
The selection of the assumed effect size (AES) critically determines the
duration of an experiment, and hence its accuracy and efficiency.
Traditionally, experimenters determine AES based on domain knowledge. However,
this method becomes impractical for online experimentation services managing
numerous experiments, and a more automated approach is hence of great demand.
We initiate the study of data-driven AES selection in for online
experimentation services by introducing two solutions. The first employs a
three-layer Gaussian Mixture Model considering the heteroskedasticity across
experiments, and it seeks to estimate the true expected effect size among
positive experiments. The second method, grounded in utility theory, aims to
determine the optimal effect size by striking a balance between the
experiment's cost and the precision of decision-making. Through comparisons
with baseline methods using both simulated and real data, we showcase the
superior performance of the proposed approaches.
( 2
min )
Measurement-based quantum computation (MBQC) is a paradigm for quantum
computation where computation is driven by local measurements on a suitably
entangled resource state. In this work we show that MBQC is related to a model
of quantum computation based on Clifford quantum cellular automata (CQCA).
Specifically, we show that certain MBQCs can be directly constructed from CQCAs
which yields a simple and intuitive circuit model representation of MBQC in
terms of quantum computation based on CQCA. We apply this description to
construct various MBQC-based Ans\"atze for parameterized quantum circuits,
demonstrating that the different Ans\"atze may lead to significantly different
performances on different learning tasks. In this way, MBQC yields a family of
Hardware-efficient Ans\"atze that may be adapted to specific problem settings
and is particularly well suited for architectures with translationally
invariant gates such as neutral atoms.
( 2
min )
Designing studies that apply causal discovery requires navigating many
researcher degrees of freedom. This complexity is exacerbated when the study
involves fMRI data. In this paper we (i) describe nine challenges that occur
when applying causal discovery to fMRI data, (ii) discuss the space of
decisions that need to be made, (iii) review how a recent case study made those
decisions, (iv) and identify existing gaps that could potentially be solved by
the development of new methods. Overall, causal discovery is a promising
approach for analyzing fMRI data, and multiple successful applications have
indicated that it is superior to traditional fMRI functional connectivity
methods, but current causal discovery methods for fMRI leave room for
improvement.
( 2
min )
This paper introduces Structured Noise Space GAN (SNS-GAN), a novel approach
in the field of generative modeling specifically tailored for class-conditional
generation in both image and time series data. It addresses the challenge of
effectively integrating class labels into generative models without requiring
structural modifications to the network. The SNS-GAN method embeds class
conditions within the generator's noise space, simplifying the training process
and enhancing model versatility. The model's efficacy is demonstrated through
qualitative validations in the image domain and superior performance in time
series generation compared to baseline models. This research opens new avenues
for the application of GANs in various domains, including but not limited to
time series and image data generation.
( 2
min )
This paper studies bandit problems where an agent has access to offline data
that might be utilized to potentially improve the estimation of each arm's
reward distribution. A major obstacle in this setting is the existence of
compound biases from the observational data. Ignoring these biases and blindly
fitting a model with the biased data could even negatively affect the online
learning phase. In this work, we formulate this problem from a causal
perspective. First, we categorize the biases into confounding bias and
selection bias based on the causal structure they imply. Next, we extract the
causal bound for each arm that is robust towards compound biases from biased
observational data. The derived bounds contain the ground truth mean reward and
can effectively guide the bandit agent to learn a nearly-optimal decision
policy. We also conduct regret analysis in both contextual and non-contextual
bandit settings and show that prior causal bounds could help consistently
reduce the asymptotic regret.
( 2
min )
We establish explicit dynamics for neural networks whose training objective
has a regularising term that constrains the parameters to remain close to their
initial value. This keeps the network in a lazy training regime, where the
dynamics can be linearised around the initialisation. The standard neural
tangent kernel (NTK) governs the evolution during the training in the
infinite-width limit, although the regularisation yields an additional term
appears in the differential equation describing the dynamics. This setting
provides an appropriate framework to study the evolution of wide networks
trained to optimise generalisation objectives such as PAC-Bayes bounds, and
hence potentially contribute to a deeper theoretical understanding of such
networks.
( 2
min )
Multi-fidelity Bayesian Optimisation (MFBO) has been shown to generally
converge faster than single-fidelity Bayesian Optimisation (SFBO) (Poloczek et
al. (2017)). Inspired by recent benchmark papers, we are investigating the
long-run behaviour of MFBO, based on observations in the literature that it
might under-perform in certain scenarios (Mikkola et al. (2023), Eggensperger
et al. (2021)). An under-performance of MBFO in the long-run could
significantly undermine its application to many research tasks, especially when
we are not able to identify when the under-performance begins. We create a
simple benchmark study, showcase empirical results and discuss scenarios and
possible reasons of under-performance.
( 2
min )
Great customer experience provides a competitive edge and helps create brand differentiation. As per the Forrester report, The State Of Customer Obsession, 2022, being customer-first can make a sizable impact on an organization’s balance sheet, as organizations embracing this methodology are surpassing their peers in revenue growth. Despite contact centers being under constant pressure to […]
( 10
min )
I asked DALL-E3 (via chatgpt) for "a simple Christmas nativity scene with each element clearly labeled in large capital letters for a child who is learning to read."
"Please generate a simple Christmas nativity scene with each element clearly labeled in large capital letters for a child
( 3
min )
AI Weirdness: the strange side of machine learning
( 2
min )
AI made a splash this year — from Wall Street to the U.S. Congress — driven by a wave of developers aiming to make the world better. Here’s a look at AI in 2023 across agriculture, natural disasters, medicine and other areas worthy of a cocktail party conversation. This AI Is on Fire California has Read article >
( 7
min )
Time to gear up, hunters — Capcom’s Monster Hunter: World joins the GeForce NOW library, bringing members the ultimate hunting experience on any device. It’s all part of an adventurous week, with nearly a dozen new games joining the cloud gaming service. A Whole New World Join the Fifth Fleet on an epic adventure to Read article >
( 6
min )
We propose a Reinforcement-Learning-based system that would automatically
prescribe a hypothetical patient medications that may help the patient with
their mental-health-related speech disfluency, and adjust the medication and
the dosages in response to data from the patient. We demonstrate the components
of the system: a module that detects and evaluates speech disfluency on a large
dataset we built, and a Reinforcement Learning algorithm that automatically
finds good combinations of medications. To support the two modules, we collect
data on the effect of psychiatric medications for speech disfluency from the
literature, and build a plausible patient simulation system. We demonstrate
that the Reinforcement Learning system is, under some circumstances, able to
converge to a good medication regime. We collect and label a dataset of people
with possible speech disfluency and demonstrate our methods using that dataset.
Our work is a proof of concept: we show that there is promise in the idea of
using automatic data collection to address disfluency.
( 2
min )
We present XLand-MiniGrid, a suite of tools and grid-world environments for
meta-reinforcement learning research inspired by the diversity and depth of
XLand and the simplicity and minimalism of MiniGrid. XLand-Minigrid is written
in JAX, designed to be highly scalable, and can potentially run on GPU or TPU
accelerators, democratizing large-scale experimentation with limited resources.
To demonstrate the generality of our library, we have implemented some
well-known single-task environments as well as new meta-learning environments
capable of generating $10^8$ distinct tasks. We have empirically shown that the
proposed environments can scale up to $2^{13}$ parallel instances on the GPU,
reaching tens of millions of steps per second.
( 2
min )
Reinforcement learning (RL) often struggles to accomplish a sparse-reward
long-horizon task in a complex environment. Goal-conditioned reinforcement
learning (GCRL) has been employed to tackle this difficult problem via a
curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is
essential for the agent to ultimately find the pathway to the desired goal. How
to explore novel sub-goals efficiently is one of the most challenging issues in
GCRL. Several goal exploration methods have been proposed to address this issue
but still struggle to find the desired goals efficiently. In this paper, we
propose a novel learning objective by optimizing the entropy of both achieved
and new goals to be explored for more efficient goal exploration in sub-goal
selection based GCRL. To optimize this objective, we first explore and exploit
the frequently occurring goal-transition patterns mined in the environments
similar to the current task to compose skills via skill learning. Then, the
pretrained skills are applied in goal exploration. Evaluation on a variety of
spare-reward long-horizon benchmark tasks suggests that incorporating our
method into several state-of-the-art GCRL baselines significantly boosts their
exploration efficiency while improving or maintaining their performance. The
source code is available at: https://github.com/GEAPS/GEAPS.
( 3
min )
We train a language model (LM) to robustly answer multistep questions by
generating and answering sub-questions. We propose Chain-of-Questions, a
framework that trains a model to generate sub-questions and sub-answers one at
a time by leveraging human annotated question decomposition meaning
representation (QDMR). The key technical challenge is that QDMR only contains
sub-questions but not answers to those sub-questions, so we treat sub-answers
as latent variables and optimize them using a novel dynamic mixture of Hard-EM
and MAPO. Chain-of-Questions greatly outperforms strong neuro-symbolic methods
by 9.0 F1 on DROP contrast set, and outperforms GPT-3.5 by 24.3 F1 on HOTPOTQA
adversarial set, thus demonstrating the effectiveness and robustness of our
framework.
( 2
min )
The count of people suffering from various levels of hearing loss reached
1.57 billion in 2019. This huge number tends to suffer on many personal and
professional levels and strictly needs to be included with the rest of society
healthily. This paper presents a proof of concept of an automatic sign language
recognition system based on data obtained using a wearable device of 3 flex
sensors. The system is designed to interpret a selected set of American Sign
Language (ASL) dynamic words by collecting data in sequences of the performed
signs and using machine learning methods. The built models achieved
high-quality performances, such as Random Forest with 99% accuracy, Support
Vector Machine (SVM) with 99%, and two K-Nearest Neighbor (KNN) models with
98%. This indicates many possible paths toward the development of a full-scale
system.
( 2
min )
Diffusion models have demonstrated strong potential for robotic trajectory
planning. However, generating coherent and long-horizon trajectories from
high-level instructions remains challenging, especially for complex tasks
requiring multiple sequential skills. We propose SkillDiffuser, an end-to-end
hierarchical planning framework integrating interpretable skill learning with
conditional diffusion planning to address this problem. At the higher level,
the skill abstraction module learns discrete, human-understandable skill
representations from visual observations and language instructions. These
learned skill embeddings are then used to condition the diffusion model to
generate customized latent trajectories aligned with the skills. It allows for
generating diverse state trajectories that adhere to the learnable skills. By
integrating skill learning with conditional trajectory generation,
SkillDiffuser produces coherent behavior following abstract instructions across
diverse tasks. Experiments on multi-task robotic manipulation benchmarks like
Meta-World and LOReL demonstrate state-of-the-art performance and
human-interpretable skill representations from SkillDiffuser.
( 2
min )
Legged locomotion is arguably the most suited and versatile mode to deal with
natural or unstructured terrains. Intensive research into dynamic walking and
running controllers has recently yielded great advances, both in the optimal
control and reinforcement learning (RL) literature. Hopping is a challenging
dynamic task involving a flight phase and has the potential to increase the
traversability of legged robots. Model based control for hopping typically
relies on accurate detection of different jump phases, such as lift-off or
touch down, and using different controllers for each phase. In this paper, we
present a end-to-end RL based torque controller that learns to implicitly
detect the relevant jump phases, removing the need to provide manual heuristics
for state detection. We also extend a method for simulation to reality transfer
of the learned controller to contact rich dynamic tasks, resulting in
successful deployment on the robot after training without parameter tuning.
( 3
min )
Recently, large language models (LLMs) have made remarkable progress in
natural language processing. The most representative ability of LLMs is
in-context learning (ICL), which enables LLMs to learn patterns from in-context
exemplars without training. The performance of ICL greatly depends on the
exemplars used. However, how to choose exemplars remains unclear due to the
lack of understanding of how in-context learning works. In this paper, we
present a novel perspective on ICL by conceptualizing it as contextual
retrieval from a model of associative memory. We establish a theoretical
framework of ICL based on Hopfield Networks. Based on our framework, we look
into how in-context exemplars influence the performance of ICL and propose more
efficient active exemplar selection. Our study sheds new light on the mechanism
of ICL by connecting it to memory retrieval, with potential implications for
advancing the understanding of LLMs.
( 2
min )
Lipschitz-constrained neural networks have several advantages over
unconstrained ones and can be applied to a variety of problems, making them a
topic of attention in the deep learning community. Unfortunately, it has been
shown both theoretically and empirically that they perform poorly when equipped
with ReLU activation functions. By contrast, neural networks with learnable
1-Lipschitz linear splines are known to be more expressive. In this paper, we
show that such networks correspond to global optima of a constrained functional
optimization problem that consists of the training of a neural network composed
of 1-Lipschitz linear layers and 1-Lipschitz freeform activation functions with
second-order total-variation regularization. Further, we propose an efficient
method to train these neural networks. Our numerical experiments show that our
trained networks compare favorably with existing 1-Lipschitz neural
architectures.
( 2
min )
In this paper, we explore transferability in learning between different
attack classes in a network intrusion detection setup. We evaluate
transferability of attack classes by training a deep learning model with a
specific attack class and testing it on a separate attack class. We observe the
effects of real and synthetically generated data augmentation techniques on
transferability. We investigate the nature of observed transferability
relationships, which can be either symmetric or asymmetric. We also examine
explainability of the transferability relationships using the recursive feature
elimination algorithm. We study data preprocessing techniques to boost model
performance. The code for this work can be found at
https://github.com/ghosh64/transferability.
( 2
min )
In this work we develop a novel approach using deep neural networks to
reconstruct the conductivity distribution in elliptic problems from one
measurement of the solution over the whole domain. The approach is based on a
mixed reformulation of the governing equation and utilizes the standard
least-squares objective, with deep neural networks as ansatz functions to
approximate the conductivity and flux simultaneously. We provide a thorough
analysis of the deep neural network approximations of the conductivity for both
continuous and empirical losses, including rigorous error estimates that are
explicit in terms of the noise level, various penalty parameters and neural
network architectural parameters (depth, width and parameter bound). We also
provide multiple numerical experiments in two- and multi-dimensions to
illustrate distinct features of the approach, e.g., excellent stability with
respect to data noise and capability of solving high-dimensional problems.
( 2
min )
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse
training library for machine learning research. JaxPruner aims to accelerate
research on sparse neural networks by providing concise implementations of
popular pruning and sparse training algorithms with minimal memory and latency
overhead. Algorithms implemented in JaxPruner use a common API and work
seamlessly with the popular optimization library Optax, which, in turn, enables
easy integration with existing JAX based libraries. We demonstrate this ease of
integration by providing examples in four different codebases: Scenic, t5x,
Dopamine and FedJAX and provide baseline experiments on popular benchmarks.
( 2
min )
In this study, we propose a new activation function, called Adaptive Smooth
Activation Unit (ASAU), tailored for optimized gradient propagation, thereby
enhancing the proficiency of convolutional networks in medical image analysis.
We apply this new activation function to two important and commonly used
general tasks in medical image analysis: automatic disease diagnosis and organ
segmentation in CT and MRI. Our rigorous evaluation on the RadImageNet
abdominal/pelvis (CT and MRI) dataset and Liver Tumor Segmentation Benchmark
(LiTS) 2017 demonstrates that our ASAU-integrated frameworks not only achieve a
substantial (4.80\%) improvement over ReLU in classification accuracy (disease
detection) on abdominal CT and MRI but also achieves 1\%-3\% improvement in
dice coefficient compared to widely used activations for `healthy liver tissue'
segmentation. These improvements offer new baselines for developing a
diagnostic tool, particularly for complex, challenging pathologies. The
superior performance and adaptability of ASAU highlight its potential for
integration into a wide range of image classification and segmentation tasks.
( 2
min )
In recent years, significant progress in generative AI has highlighted the
important role of physics-inspired models that utilize advanced mathematical
concepts based on fundamental physics principles to enhance artificial
intelligence capabilities. Among these models, those based on diffusion
equations have greatly improved image quality. This study aims to explore the
potential uses of Maxwell-Boltzmann equation, which forms the basis of the
kinetic theory of gases, and the Michaelis-Menten model in Marketing Mix
Modelling (MMM) applications. We propose incorporating these equations into
Hierarchical Bayesian models to analyse consumer behaviour in the context of
advertising. These equation sets excel in accurately describing the random
dynamics in complex systems like social interactions and consumer-advertising
interactions.
( 2
min )
Recent work by Marino et al. (2020) showed improved performance in sequential
density estimation by combining masked autoregressive flows with hierarchical
latent variable models. We draw a connection between such autoregressive
generative models and the task of lossy video compression. Specifically, we
view recent neural video compression methods (Lu et al., 2019; Yang et al.,
2020b; Agustssonet al., 2020) as instances of a generalized stochastic temporal
autoregressive transform, and propose avenues for enhancement based on this
insight. Comprehensive evaluations on large-scale video data show improved
rate-distortion performance over both state-of-the-art neural and conventional
video compression methods.
( 2
min )
Diffusion-based generative models represent the current state-of-the-art for
image generation. However, standard diffusion models are based on Euclidean
geometry and do not translate directly to manifold-valued data. In this work,
we develop extensions of both score-based generative models (SGMs) and
Denoising Diffusion Probabilistic Models (DDPMs) to the Lie group of 3D
rotations, SO(3). SO(3) is of particular interest in many disciplines such as
robotics, biochemistry and astronomy/cosmology science. Contrary to more
general Riemannian manifolds, SO(3) admits a tractable solution to heat
diffusion, and allows us to implement efficient training of diffusion models.
We apply both SO(3) DDPMs and SGMs to synthetic densities on SO(3) and
demonstrate state-of-the-art results. Additionally, we demonstrate the
practicality of our model on pose estimation tasks and in predicting correlated
galaxy orientations for astrophysics/cosmology.
( 2
min )
As large language models (LLMs) like ChatGPT have gained traction, an
increasing number of news websites have begun utilizing them to generate
articles. However, not only can these language models produce factually
inaccurate articles on reputable websites but disreputable news sites can
utilize LLMs to mass produce misinformation. To begin to understand this
phenomenon, we present one of the first large-scale studies of the prevalence
of synthetic articles within online news media. To do this, we train a
DeBERTa-based synthetic news detector and classify over 15.90 million articles
from 3,074 misinformation and mainstream news websites. We find that between
January 1, 2022, and May 1, 2023, the relative number of synthetic news
articles increased by 55.4% on mainstream websites while increasing by 457% on
misinformation sites. We find that this increase is largely driven by smaller
less popular websites. Analyzing the impact of the release of ChatGPT using an
interrupted-time-series, we show that while its release resulted in a marked
increase in synthetic articles on small sites as well as misinformation news
websites, there was not a corresponding increase on large mainstream news
websites.
( 3
min )
Powered by new advances in sensor development and artificial intelligence,
the decreasing cost of computation, and the pervasiveness of handheld
computation devices, biometric user authentication (and identification) is
rapidly becoming ubiquitous. Modern approaches to biometric authentication,
based on sophisticated machine learning techniques, cannot avoid storing either
trained-classifier details or explicit user biometric data, thus exposing
users' credentials to falsification. In this paper, we introduce a secure way
to handle user-specific information involved with the use of vector-space
classifiers or artificial neural networks for biometric authentication. Our
proposed architecture, called a Neural Fuzzy Extractor (NFE), allows the
coupling of pre-existing classifiers with fuzzy extractors, through a
artificial-neural-network-based buffer called an expander, with minimal or no
performance degradation. The NFE thus offers all the performance advantages of
modern deep-learning-based classifiers, and all the security of standard fuzzy
extractors. We demonstrate the NFE retrofit to a classic artificial neural
network for a simple scenario of fingerprint-based user authentication.
( 3
min )
This paper presents the computational challenge on topological deep learning
that was hosted within the ICML 2023 Workshop on Topology and Geometry in
Machine Learning. The competition asked participants to provide open-source
implementations of topological neural networks from the literature by
contributing to the python packages TopoNetX (data processing) and TopoModelX
(deep learning). The challenge attracted twenty-eight qualifying submissions in
its two-month duration. This paper describes the design of the challenge and
summarizes its main findings.
( 2
min )
Objective: Early identification of ADHD is necessary to provide the
opportunity for timely treatment. However, screening the symptoms of ADHD on a
large scale is not easy. This study aimed to validate a video game (FishFinder)
for the screening of ADHD using objective measurement of the core symptoms of
this disorder. Method: The FishFinder measures attention and impulsivity
through in-game performance and evaluates the child's hyperactivity using
smartphone motion sensors. This game was tested on 26 children with ADHD and 26
healthy children aged 5 to 12 years. A Support Vector Machine was employed to
detect children with ADHD. results: This system showed 92.3% accuracy, 90%
sensitivity, and 93.7% specificity using a combination of in-game and movement
features. Conclusions: The FishFinder demonstrated a strong ability to identify
ADHD in children. So, this game can be used as an affordable, accessible, and
enjoyable method for the objective screening of ADHD.
( 2
min )
Accurately predicting line loss rates is vital for effective line loss
management in distribution networks, especially over short-term multi-horizons
ranging from one hour to one week. In this study, we propose
Attention-GCN-LSTM, a novel method that combines Graph Convolutional Networks
(GCN), Long Short-Term Memory (LSTM), and a three-level attention mechanism to
address this challenge. By capturing spatial and temporal dependencies, our
model enables accurate forecasting of line loss rates across multiple horizons.
Through comprehensive evaluation using real-world data from 10KV feeders, our
Attention-GCN-LSTM model consistently outperforms existing algorithms,
exhibiting superior performance in terms of prediction accuracy and
multi-horizon forecasting. This model holds significant promise for enhancing
line loss management in distribution networks.
( 2
min )
Text classification is an important topic in the field of natural language
processing. It has been preliminarily applied in information retrieval, digital
library, automatic abstracting, text filtering, word semantic discrimination
and many other fields. The aim of this research is to use a variety of
algorithms to test the ability to identify offensive posts and evaluate their
performance against a variety of assessment methods. The motivation for this
project is to reduce the harm of these languages to human censors by automating
the screening of offending posts. The field is a new one, and despite much
interest in the past two years, there has been no focus on the object of the
offence. Through the experiment of this project, it should inspire future
research on identification methods as well as identification content.
( 2
min )
Causal discovery with latent variables is a crucial but challenging task.
Despite the emergence of numerous methods aimed at addressing this challenge,
they are not fully identified to the structure that two observed variables are
influenced by one latent variable and there might be a directed edge in
between. Interestingly, we notice that this structure can be identified through
the utilization of higher-order cumulants. By leveraging the higher-order
cumulants of non-Gaussian data, we provide an analytical solution for
estimating the causal coefficients or their ratios. With the estimated (ratios
of) causal coefficients, we propose a novel approach to identify the existence
of a causal edge between two observed variables subject to latent variable
influence. In case when such a causal edge exits, we introduce an asymmetry
criterion to determine the causal direction. The experimental results
demonstrate the effectiveness of our proposed method.
( 2
min )
De novo peptide sequencing from mass spectrometry (MS) data is a critical
task in proteomics research. Traditional de novo algorithms have encountered a
bottleneck in accuracy due to the inherent complexity of proteomics data. While
deep learning-based methods have shown progress, they reduce the problem to a
translation task, potentially overlooking critical nuances between spectra and
peptides. In our research, we present ContraNovo, a pioneering algorithm that
leverages contrastive learning to extract the relationship between spectra and
peptides and incorporates the mass information into peptide decoding, aiming to
address these intricacies more efficiently. Through rigorous evaluations on two
benchmark datasets, ContraNovo consistently outshines contemporary
state-of-the-art solutions, underscoring its promising potential in enhancing
de novo peptide sequencing. The source code is available at
https://github.com/BEAM-Labs/ContraNovo.
( 2
min )
In this paper, we focus on the prediction phase of a random forest and study
the problem of representing a bag of decision trees using a smaller bag of
decision trees, where we only consider binary decision problems on the binary
domain and simple decision trees in which an internal node is limited to
querying the Boolean value of a single variable. As a main result, we show that
the majority function of $n$ variables can be represented by a bag of $T$ ($<
n$) decision trees each with polynomial size if $n-T$ is a constant, where $n$
and $T$ must be odd (in order to avoid the tie break). We also show that a bag
of $n$ decision trees can be represented by a bag of $T$ decision trees each
with polynomial size if $n-T$ is a constant and a small classification error is
allowed. A related result on the $k$-out-of-$n$ functions is presented too.
( 2
min )
Drawing on theoretical insights, we advocate an error-based thresholding
(EBT) mechanism for learned ISTA (LISTA), which utilizes a function of the
layer-wise reconstruction error to suggest a specific threshold for each
observation in the shrinkage function of each layer. We show that the proposed
EBT mechanism well disentangles the learnable parameters in the shrinkage
functions from the reconstruction errors, endowing the obtained models with
improved adaptivity to possible data variations. With rigorous analyses, we
further show that the proposed EBT also leads to a faster convergence on the
basis of LISTA or its variants, in addition to its higher adaptivity. Extensive
experimental results confirm our theoretical analyses and verify the
effectiveness of our methods.
( 2
min )
Causal Structure Learning (CSL), amounting to extracting causal relations
among the variables in a dataset, is widely perceived as an important step
towards robust and transparent models. Constraint-based CSL leverages
conditional independence tests to perform causal discovery. We propose
Shapley-PC, a novel method to improve constraint-based CSL algorithms by using
Shapley values over the possible conditioning sets to decide which variables
are responsible for the observed conditional (in)dependences. We prove
soundness and asymptotic consistency and demonstrate that it can outperform
state-of-the-art constraint-based, search-based and functional causal
model-based methods, according to standard metrics in CSL.
( 2
min )
A large body of NLP research has documented the ways gender biases manifest
and amplify within large language models (LLMs), though this research has
predominantly operated within a gender binary-centric context. A growing body
of work has identified the harmful limitations of this gender-exclusive
framing; many LLMs cannot correctly and consistently refer to persons outside
the gender binary, especially if they use neopronouns. While data scarcity has
been identified as a possible culprit, the precise mechanisms through which it
influences LLM misgendering remain underexplored. Our work addresses this gap
by studying data scarcity's role in subword tokenization and, consequently, the
formation of LLM word representations. We uncover how the Byte-Pair Encoding
(BPE) tokenizer, a backbone for many popular LLMs, contributes to neopronoun
misgendering through out-of-vocabulary behavior. We introduce pronoun
tokenization parity (PTP), a novel approach to reduce LLM neopronoun
misgendering by preserving a token's functional structure. We evaluate PTP's
efficacy using pronoun consistency-based metrics and a novel syntax-based
metric. Through several controlled experiments, finetuning LLMs with PTP
improves neopronoun consistency from 14.5% to 58.4%, highlighting the
significant role tokenization plays in LLM pronoun consistency.
( 3
min )
Chronic Obstructive Pulmonary Disorder (COPD) is a prevalent respiratory
disease that significantly impacts the quality of life of affected individuals.
This paper presents COPDFlowNet, a novel deep-learning framework that leverages
a custom Generative Adversarial Network (GAN) to generate synthetic
Computational Fluid Dynamics (CFD) velocity flow field images specific to the
trachea of COPD patients. These synthetic images serve as a valuable resource
for data augmentation and model training. Additionally, COPDFlowNet
incorporates a custom Convolutional Neural Network (CNN) architecture to
predict the location of the obstruction site.
( 2
min )
In recent years, simulations of pedestrians using the multi-agent
reinforcement learning (MARL) have been studied. This study considered the
roads on a grid-world environment, and implemented pedestrians as MARL agents
using an echo-state network and the least squares policy iteration method.
Under this environment, the ability of these agents to learn to move forward by
avoiding other agents was investigated. Specifically, we considered two types
of tasks: the choice between a narrow direct route and a broad detour, and the
bidirectional pedestrian flow in a corridor. The simulations results indicated
that the learning was successful when the density of the agents was not that
high.
( 2
min )
Multi-document summarization is the process of automatically generating a
concise summary of multiple documents related to the same topic. This summary
can help users quickly understand the key information from a large collection
of documents. Multi-document summarization systems are more complex than
single-document summarization systems due to the need to identify and combine
information from multiple sources. In this paper, we have developed a machine
learning model that generates a concise summary of a topic from multiple news
documents. The model is designed to be unbiased by sampling its input equally
from all the different aspects of the topic, even if the majority of the news
sources lean one way.
( 2
min )
Graph Neural Networks are notorious for its memory consumption. A recent
Transformer based GNN called Graph Transformer are shown to obtain superior
performances when long range dependencies exist. However, combining graph data
and Transformer architecture led to a combinationally worse memory issue. We
propose a novel version of "edge regularization technique" that alleviates the
need for Positional Encoding and ultimately alleviate GT's out of memory issue.
We observe that it is not clear whether having an edge regularization on top of
positional encoding is helpful. However, it seems evident when no positional
encoding is applied, edge regularization technique indeed stably improves GT's
performance.
( 2
min )
In this work we introduce Labrador, a pre-trained Transformer model for
laboratory data. Labrador and BERT were pre-trained on a corpus of 100 million
lab test results from electronic health records (EHRs) and evaluated on various
downstream outcome prediction tasks. Both models demonstrate mastery of the
pre-training task but neither consistently outperform XGBoost on downstream
supervised tasks. Our ablation studies reveal that transfer learning shows
limited effectiveness for BERT and achieves marginal success with Labrador. We
explore the reasons for the failure of transfer learning and suggest that the
data generating process underlying each patient cannot be characterized
sufficiently using labs alone, among other factors. We encourage future work to
focus on joint modeling of multiple EHR data categories and to include
tree-based baselines in their evaluations.
( 2
min )
Graph-based collaborative filtering methods have prevailing performance for
recommender systems since they can capture high-order information between users
and items, in which the graphs are constructed from the observed user-item
interactions that might miss links or contain spurious positive interactions in
industrial scenarios. The Bayesian Graph Neural Network framework approaches
this issue with generative models for the interaction graphs. The critical
problem is to devise a proper family of graph generative models tailored to
recommender systems. We propose an efficient generative model that jointly
considers the preferences of users, the concurrence of items and some important
graph structure information. Experiments on four popular benchmark datasets
demonstrate the effectiveness of our proposed graph generative methods for
recommender systems.
( 2
min )
Motivated by applications in queueing theory, we consider a stochastic
control problem whose state space is the $d$-dimensional positive orthant. The
controlled process $Z$ evolves as a reflected Brownian motion whose covariance
matrix is exogenously specified, as are its directions of reflection from the
orthant's boundary surfaces. A system manager chooses a drift vector
$\theta(t)$ at each time $t$ based on the history of $Z$, and the cost rate at
time $t$ depends on both $Z(t)$ and $\theta(t)$. In our initial problem
formulation, the objective is to minimize expected discounted cost over an
infinite planning horizon, after which we treat the corresponding ergodic
control problem. Extending earlier work by Han et al. (Proceedings of the
National Academy of Sciences, 2018, 8505-8510), we develop and illustrate a
simulation-based computational method that relies heavily on deep neural
network technology. For test problems studied thus far, our method is accurate
to within a fraction of one percent, and is computationally feasible in
dimensions up to at least $d=30$.
( 2
min )
We provide a systematic investigation of using physics-informed neural
networks to compute Lyapunov functions. We encode Lyapunov conditions as a
partial differential equation (PDE) and use this for training neural network
Lyapunov functions. We analyze the analytical properties of the solutions to
the Lyapunov and Zubov PDEs. In particular, we show that employing the Zubov
equation in training neural Lyapunov functions can lead to approximate regions
of attraction close to the true domain of attraction. We also examine
approximation errors and the convergence of neural approximations to the unique
solution of Zubov's equation. We then provide sufficient conditions for the
learned neural Lyapunov functions that can be readily verified by
satisfiability modulo theories (SMT) solvers, enabling formal verification of
both local stability analysis and region-of-attraction estimates in the large.
Through a number of nonlinear examples, ranging from low to high dimensions, we
demonstrate that the proposed framework can outperform traditional
sums-of-squares (SOS) Lyapunov functions obtained using semidefinite
programming (SDP).
( 2
min )
We find a succinct expression for computing the sequence $x_t = a_t x_{t-1} +
b_t$ in parallel with two prefix sums, given $t = (1, 2, \dots, n)$, $a_t \in
\mathbb{R}^n$, $b_t \in \mathbb{R}^n$, and initial value $x_0 \in \mathbb{R}$.
On $n$ parallel processors, the computation of $n$ elements incurs
$\mathcal{O}(\log n)$ time and $\mathcal{O}(n)$ space. Sequences of this form
are ubiquitous in science and engineering, making efficient parallelization
useful for a vast number of applications. We implement our expression in
software, test it on parallel hardware, and verify that it executes faster than
sequential computation by a factor of $\frac{n}{\log n}$.
( 2
min )
Air pollution is a result of multiple sources including both natural and
anthropogenic activities. The rapid urbanization of the cities such as
Bujumbura economic capital of Burundi, is one of these factors. The very first
characterization of the spatio-temporal variability of PM2.5 in Bujumbura and
the forecasting of PM2.5 concentration have been conducted in this paper using
data collected during a year, from august 2022 to august 2023, by low cost
sensors installed in Bujumbura city. For each commune, an hourly, daily and
seasonal analysis were carried out and the results showed that the mass
concentrations of PM2.5 in the three municipalities differ from one commune to
another. The average hourly and annual PM2.5 concentrations exceed the World
Health Organization standards. The range is between 28.3 and 35.0 microgram/m3
. In order to make prediction of PM2.5 concentration, an investigation of RNN
with Long Short Term Memory (LSTM) has been undertaken.
( 2
min )
Over the last decade, the Dip-test of unimodality has gained increasing
interest in the data mining community as it is a parameter-free statistical
test that reliably rates the modality in one-dimensional samples. It returns a
so called Dip-value and a corresponding probability for the sample's
unimodality (Dip-p-value). These two values share a sigmoidal relationship.
However, the specific transformation is dependent on the sample size. Many
Dip-based clustering algorithms use bootstrapped look-up tables translating
Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a
specifically designed sigmoid function as a substitute for these
state-of-the-art look-up tables. This accelerates computation and provides an
approximation of the Dip- to Dip-p-value transformation for every single sample
size. Further, it is differentiable and can therefore easily be integrated in
learning schemes using gradient descent. We showcase this by exploiting our
function in a novel subspace clustering algorithm called Dip'n'Sub. We
highlight in extensive experiments the various benefits of our proposal.
( 3
min )
Time-series anomaly detection deals with the problem of detecting anomalous
timesteps by learning normality from the sequence of observations. However, the
concept of normality evolves over time, leading to a "new normal problem",
where the distribution of normality can be changed due to the distribution
shifts between training and test data. This paper highlights the prevalence of
the new normal problem in unsupervised time-series anomaly detection studies.
To tackle this issue, we propose a simple yet effective test-time adaptation
strategy based on trend estimation and a self-supervised approach to learning
new normalities during inference. Extensive experiments on real-world
benchmarks demonstrate that incorporating the proposed strategy into the
anomaly detector consistently improves the model's performance compared to the
baselines, leading to robustness to the distribution shifts.
( 2
min )
We provide an optimized implementation of the forward pass of
FlashAttention-2, a popular memory-aware scaled dot-product attention
algorithm, as a custom fused CUDA kernel targeting NVIDIA Hopper architecture
and written using the open-source CUTLASS library. In doing so, we explain the
challenges and techniques involved in fusing online-softmax with back-to-back
GEMM kernels, utilizing the Hopper-specific Tensor Memory Accelerator (TMA) and
Warpgroup Matrix-Multiply-Accumulate (WGMMA) instructions, defining and
transforming CUTLASS Layouts and Tensors, overlapping copy and GEMM operations,
and choosing optimal tile sizes for the Q, K and V attention matrices while
balancing the register pressure and shared memory utilization. In head-to-head
benchmarks on a single H100 PCIe GPU for some common choices of
hyperparameters, we observe 20-50% higher FLOPs/s over a version of
FlashAttention-2 optimized for last-generation NVIDIA Ampere architecture.
( 2
min )
This paper introduces a novel approach for topic modeling utilizing latent
codebooks from Vector-Quantized Variational Auto-Encoder~(VQ-VAE), discretely
encapsulating the rich information of the pre-trained embeddings such as the
pre-trained language model. From the novel interpretation of the latent
codebooks and embeddings as conceptual bag-of-words, we propose a new
generative topic model called Topic-VQ-VAE~(TVQ-VAE) which inversely generates
the original documents related to the respective latent codebook. The TVQ-VAE
can visualize the topics with various generative distributions including the
traditional BoW distribution and the autoregressive image generation. Our
experimental results on document analysis and image generation demonstrate that
TVQ-VAE effectively captures the topic context which reveals the underlying
structures of the dataset and supports flexible forms of document generation.
Official implementation of the proposed TVQ-VAE is available at
https://github.com/clovaai/TVQ-VAE.
( 2
min )
The unprecedented performance of machine learning models in recent years,
particularly Deep Learning and transformer models, has resulted in their
application in various domains such as finance, healthcare, and education.
However, the models are error-prone and cannot be used autonomously, especially
in decision-making scenarios where, technically or ethically, the cost of error
is high. Moreover, because of the black-box nature of these models, it is
frequently difficult for the end user to comprehend the models' outcomes and
underlying processes to trust and use the model outcome to make a decision.
Explainable Artificial Intelligence (XAI) aids end-user understanding of the
model by utilizing approaches, including visualization techniques, to explain
and interpret the inner workings of the model and how it arrives at a result.
Although numerous research studies have been conducted recently focusing on the
performance of models and the XAI approaches, less work has been done on the
impact of explanations on human-AI team performance. This paper surveyed the
recent empirical studies on XAI's impact on human-AI decision-making,
identified the challenges, and proposed future research directions.
( 2
min )
In this article we offer a comprehensive analysis of the Urysohn's classifier
in a binary classification context. It utilizes Urysohn's Lemma of Topology to
construct separating functions, providing rigorous and adaptable solutions.
Numerical experiments demonstrated exceptional performance, with scores ranging
from 95% to 100%. Notably, the Urysohn's classifier outperformed CatBoost and
KNN in various scenarios. Despite sensitivity to the p-metric parameter, it
proved robust and adaptable. The Urysohn's classifier's mathematical rigor and
adaptability make it promising for binary classification, with applications in
medical diagnosis, fraud detection and cyber security. Future research includes
parameter optimization and combining the Urysohn's classifier with other
techniques. It offers an elegant and principled approach to classification,
ensuring integrity and valuable data insights.
( 2
min )
Over the last decade, the Dip-test of unimodality has gained increasing
interest in the data mining community as it is a parameter-free statistical
test that reliably rates the modality in one-dimensional samples. It returns a
so called Dip-value and a corresponding probability for the sample's
unimodality (Dip-p-value). These two values share a sigmoidal relationship.
However, the specific transformation is dependent on the sample size. Many
Dip-based clustering algorithms use bootstrapped look-up tables translating
Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a
specifically designed sigmoid function as a substitute for these
state-of-the-art look-up tables. This accelerates computation and provides an
approximation of the Dip- to Dip-p-value transformation for every single sample
size. Further, it is differentiable and can therefore easily be integrated in
learning schemes using gradient descent. We showcase this by exploiting our
function in a novel subspace clustering algorithm called Dip'n'Sub. We
highlight in extensive experiments the various benefits of our proposal.
( 3
min )
Air pollution is a result of multiple sources including both natural and
anthropogenic activities. The rapid urbanization of the cities such as
Bujumbura economic capital of Burundi, is one of these factors. The very first
characterization of the spatio-temporal variability of PM2.5 in Bujumbura and
the forecasting of PM2.5 concentration have been conducted in this paper using
data collected during a year, from august 2022 to august 2023, by low cost
sensors installed in Bujumbura city. For each commune, an hourly, daily and
seasonal analysis were carried out and the results showed that the mass
concentrations of PM2.5 in the three municipalities differ from one commune to
another. The average hourly and annual PM2.5 concentrations exceed the World
Health Organization standards. The range is between 28.3 and 35.0 microgram/m3
. In order to make prediction of PM2.5 concentration, an investigation of RNN
with Long Short Term Memory (LSTM) has been undertaken.
( 2
min )
Recent advances in autonomous robotic technologies have highlighted the
growing need for precise environmental analysis. LiDAR semantic segmentation
has gained attention to accomplish fine-grained scene understanding by acting
directly on raw content provided by sensors. Recent solutions showed how
different learning techniques can be used to improve the performance of the
model, without any architectural or dataset change. Following this trend, we
present a coarse-to-fine setup that LEArns from classification mistaKes (LEAK)
derived from a standard model. First, classes are clustered into macro groups
according to mutual prediction errors; then, the learning process is
regularized by: (1) aligning class-conditional prototypical feature
representation for both fine and coarse classes, (2) weighting instances with a
per-class fairness index. Our LEAK approach is very general and can be
seamlessly applied on top of any segmentation architecture; indeed,
experimental results showed that it enables state-of-the-art performances on
different architectures, datasets and tasks, while ensuring more balanced
class-wise results and faster convergence.
( 2
min )
Offline reinforcement learning (RL) aims to learn an effective policy from a
pre-collected dataset. Most existing works are to develop sophisticated
learning algorithms, with less emphasis on improving the data collection
process. Moreover, it is even challenging to extend the single-task setting and
collect a task-agnostic dataset that allows an agent to perform multiple
downstream tasks. In this paper, we propose a Curiosity-driven Unsupervised
Data Collection (CUDC) method to expand feature space using adaptive temporal
distances for task-agnostic data collection and ultimately improve learning
efficiency and capabilities for multi-task offline RL. To achieve this, CUDC
estimates the probability of the k-step future states being reachable from the
current states, and adapts how many steps into the future that the dynamics
model should predict. With this adaptive reachability mechanism in place, the
feature representation can be diversified, and the agent can navigate itself to
collect higher-quality data with curiosity. Empirically, CUDC surpasses
existing unsupervised methods in efficiency and learning performance in various
downstream offline RL tasks of the DeepMind control suite.
( 2
min )
The Distributional Random Forest (DRF) is a recently introduced Random Forest
algorithm to estimate multivariate conditional distributions. Due to its
general estimation procedure, it can be employed to estimate a wide range of
targets such as conditional average treatment effects, conditional quantiles,
and conditional correlations. However, only results about the consistency and
convergence rate of the DRF prediction are available so far. We characterize
the asymptotic distribution of DRF and develop a bootstrap approximation of it.
This allows us to derive inferential tools for quantifying standard errors and
the construction of confidence regions that have asymptotic coverage
guarantees. In simulation studies, we empirically validate the developed theory
for inference of low-dimensional targets and for testing distributional
differences between two populations.
( 2
min )
We establish novel rates for the Gaussian approximation of random deep neural
networks with Gaussian parameters (weights and biases) and Lipschitz activation
functions, in the wide limit. Our bounds apply for the joint output of a
network evaluated any finite input set, provided a certain non-degeneracy
condition of the infinite-width covariances holds. We demonstrate that the
distance between the network output and the corresponding Gaussian
approximation scales inversely with the width of the network, exhibiting faster
convergence than the naive heuristic suggested by the central limit theorem. We
also apply our bounds to obtain theoretical approximations for the exact
Bayesian posterior distribution of the network, when the likelihood is a
bounded Lipschitz function of the network output evaluated on a (finite)
training set. This includes popular cases such as the Gaussian likelihood, i.e.
exponential of minus the mean squared error.
( 2
min )
Today we are excited to announce that the Llama Guard model is now available for customers using Amazon SageMaker JumpStart. Llama Guard provides input and output safeguards in large language model (LLM) deployment. It’s one of the components under Purple Llama, Meta’s initiative featuring open trust and safety tools and evaluations to help developers build […]
( 15
min )
In this post, you learn how to prepare data sourced from Amazon Security Lake, and then train and deploy an ML model using an IP Insights algorithm in SageMaker. This model identifies anomalous network traffic or behavior which can then be composed as part of a larger end-to-end security solution.
( 13
min )
In this issue of Research Focus: Optimized exit-augmented models for scalable efficient inference; NeurIPS LLM Efficiency Challenge; LLM-empowered automated data exploration; Boosting cloud efficiency with data-driven decision-making and optimization.
The post Research Focus: Week of December 18, 2023 appeared first on Microsoft Research.
( 9
min )
Outside the glare of the klieg lights that ChatGPT commanded this year, a troupe of autonomous machines nudged the frontiers of robotics forward. Here are six that showed special prowess — swimming, diving, gripping, seeing, strolling and flying through 2023. A Media Darling at CES Ella — a smart stroller from startup Glüxkind Technologies, of Read article >
( 7
min )
Thomson Reuters, the global content and technology company, is transforming the legal industry with generative AI. In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Thomson Reuters Chief Product Officer David Wong about its potential — and implications. Many of Thomson Reuters offerings for the legal industry either address an information Read article >
( 6
min )
The latest OpenUSD updates enable users to tackle larger, more complex scenes with enhanced geometry control and streamlined asset management.
( 7
min )
These compounds can kill methicillin-resistant Staphylococcus aureus (MRSA), a bacterium that causes deadly infections.
( 10
min )
This new method draws on 200-year-old geometric foundations to give artists control over the appearance of animated characters.
( 10
min )
The latest industrial inference engines, such as FasterTransformer and
TurboTransformers, have verified that half-precision floating point (FP16) and
8-bit integer (INT8) quantization can greatly improve model inference speed.
However, the existing INT8 quantization methods are too complicated, and
improper usage will lead to model performance damage greatly. In this paper, we
develop a toolkit for users to easily quantize their models for inference, in
which Self-Adaptive Mixed-Precision (SAMP) is proposed to automatically control
quantization rate by a mixed-precision architecture to balance model accuracy
and efficiency. Experimental results show that our SAMP toolkit has a higher
speedup than PyTorch and FasterTransformer while ensuring the required
accuracy. In addition, SAMP is based on a modular design, decoupling the
tokenizer, embedding, encoder and target layers, which allows users to handle
various downstream tasks and can be seamlessly integrated into PyTorch.
( 2
min )
Online social media is integral to human life, facilitating messaging,
information sharing, and confidential communication while preserving privacy.
Platforms like Twitter, Instagram, and Facebook exemplify this phenomenon.
However, users face challenges due to network anomalies, often stemming from
malicious activities such as identity theft for financial gain or harm. This
paper proposes a novel method using user similarity measures and the Generative
Adversarial Network (GAN) algorithm to identify fake user accounts in the
Twitter dataset. Despite the problem's complexity, the method achieves an AUC
rate of 80\% in classifying and detecting fake accounts. Notably, the study
builds on previous research, highlighting advancements and insights into the
evolving landscape of anomaly detection in online social networks.
( 2
min )
Much research has been devoted to the problem of learning fair
representations; however, they do not explicitly the relationship between
latent representations. In many real-world applications, there may be causal
relationships between latent representations. Furthermore, most fair
representation learning methods focus on group-level fairness and are based on
correlations, ignoring the causal relationships underlying the data. In this
work, we theoretically demonstrate that using the structured representations
enable downstream predictive models to achieve counterfactual fairness, and
then we propose the Counterfactual Fairness Variational AutoEncoder (CF-VAE) to
obtain structured representations with respect to domain knowledge. The
experimental results show that the proposed method achieves better fairness and
accuracy performance than the benchmark fairness methods.
( 2
min )
In this paper, we interpret disentanglement as the discovery of local charts
of the data manifold and trace how this definition naturally leads to an
equivalent condition for disentanglement: commutativity between factors of
variation. We study the impact of this manifold framework to two classes of
problems: learning matrix exponential operators and compressing data-generating
models. In each problem, the manifold perspective yields interesting results
about the feasibility and fruitful approaches their solutions. We also link our
manifold framework to two other common disentanglement paradigms: group
theoretic and probabilistic approaches to disentanglement. In each case, we
show how these frameworks can be merged with our manifold perspective.
Importantly, we recover commutativity as a central property in both alternative
frameworks, further highlighting its importance in disentanglement.
( 2
min )
We introduce Mesogeos, a large-scale multi-purpose dataset for wildfire
modeling in the Mediterranean. Mesogeos integrates variables representing
wildfire drivers (meteorology, vegetation, human activity) and historical
records of wildfire ignitions and burned areas for 17 years (2006-2022). It is
designed as a cloud-friendly spatio-temporal dataset, namely a datacube,
harmonizing all variables in a grid of 1km x 1km x 1-day resolution. The
datacube structure offers opportunities to assess machine learning (ML) usage
in various wildfire modeling tasks. We extract two ML-ready datasets that
establish distinct tracks to demonstrate this potential: (1) short-term
wildfire danger forecasting and (2) final burned area estimation given the
point of ignition. We define appropriate metrics and baselines to evaluate the
performance of models in each track. By publishing the datacube, along with the
code to create the ML datasets and models, we encourage the community to foster
the implementation of additional tracks for mitigating the increasing threat of
wildfires in the Mediterranean.
( 2
min )
We study hypothesis testing under communication constraints, where each
sample is quantized before being revealed to a statistician. Without
communication constraints, it is well known that the sample complexity of
simple binary hypothesis testing is characterized by the Hellinger distance
between the distributions. We show that the sample complexity of simple binary
hypothesis testing under communication constraints is at most a logarithmic
factor larger than in the unconstrained setting and this bound is tight. We
develop a polynomial-time algorithm that achieves the aforementioned sample
complexity. Our framework extends to robust hypothesis testing, where the
distributions are corrupted in the total variation distance. Our proofs rely on
a new reverse data processing inequality and a reverse Markov inequality, which
may be of independent interest. For simple $M$-ary hypothesis testing, the
sample complexity in the absence of communication constraints has a logarithmic
dependence on $M$. We show that communication constraints can cause an
exponential blow-up leading to $\Omega(M)$ sample complexity even for adaptive
algorithms.
( 2
min )
The applications of traditional statistical feature selection methods to
high-dimension, low sample-size data often struggle and encounter challenging
problems, such as overfitting, curse of dimensionality, computational
infeasibility, and strong model assumption. In this paper, we propose a novel
two-step nonparametric approach called Deep Feature Screening (DeepFS) that can
overcome these problems and identify significant features with high precision
for ultra high-dimensional, low-sample-size data. This approach first extracts
a low-dimensional representation of input data and then applies feature
screening based on multivariate rank distance correlation recently developed by
Deb and Sen (2021). This approach combines the strengths of both deep neural
networks and feature screening, and thereby has the following appealing
features in addition to its ability of handling ultra high-dimensional data
with small number of samples: (1) it is model free and distribution free; (2)
it can be used for both supervised and unsupervised feature selection; and (3)
it is capable of recovering the original input data. The superiority of DeepFS
is demonstrated via extensive simulation studies and real data analyses.
( 2
min )
In this paper, we provide a geometric interpretation of the structure of Deep
Learning (DL) networks, characterized by $L$ hidden layers, a ReLU ramp
activation function, an $\mathcal{L}^2$ Schatten class (or Hilbert-Schmidt)
cost function, and input and output spaces $\mathbb{R}^Q$ with equal dimension
$Q\geq1$. The hidden layers are also defined on $\mathbb{R}^{Q}$; the training
input size $N$ can be arbitrarily large - thus, we are considering the
underparametrized regime. We apply our recent results on shallow neural
networks to construct an explicit family of minimizers for the global minimum
of the cost function in the case $L\geq Q$, which we show to be degenerate. In
the context presented here, the hidden layers of the DL network "curate" the
training inputs by recursive application of a truncation map that minimizes the
noise to signal ratio of the training inputs. Moreover, we determine a set of
$2^Q-1$ distinct degenerate local minima of the cost function. Our
constructions make no use of gradient descent algorithms at all.
( 3
min )
As AI systems become more intelligent and their behavior becomes more
challenging to assess, they may learn to game the flaws of human feedback
instead of genuinely striving to follow instructions; however, this risk can be
mitigated by controlling how LLMs generalize human feedback to situations where
it is unreliable. To better understand how reward models generalize, we craft
69 distribution shifts spanning 8 categories. We find that reward models do not
learn to evaluate `instruction-following' by default and instead favor personas
that resemble internet text. Techniques for interpreting reward models'
internal representations achieve better generalization than standard
fine-tuning, but still frequently fail to distinguish instruction-following
from conflated behaviors. We consolidate the 15 most challenging distribution
shifts into the GENeralization analogIES (GENIES) benchmark, which we hope will
enable progress toward controlling reward model generalization.
( 2
min )
Stochastic Gradient Descent (SGD) is an out-of-equilibrium algorithm used
extensively to train artificial neural networks. However very little is known
on to what extent SGD is crucial for to the success of this technology and, in
particular, how much it is effective in optimizing high-dimensional non-convex
cost functions as compared to other optimization algorithms such as Gradient
Descent (GD). In this work we leverage dynamical mean field theory to benchmark
its performances in the high-dimensional limit. To do that, we consider the
problem of recovering a hidden high-dimensional non-linearly encrypted signal,
a prototype high-dimensional non-convex hard optimization problem. We compare
the performances of SGD to GD and we show that SGD largely outperforms GD for
sufficiently small batch sizes. In particular, a power law fit of the
relaxation time of these algorithms shows that the recovery threshold for SGD
with small batch size is smaller than the corresponding one of GD.
( 2
min )
With the rapid growth of edge intelligence, the deployment of federated
learning (FL) over wireless networks has garnered increasing attention, which
is called Federated Edge Learning (FEEL). In FEEL, both mobile devices
transmitting model parameters over noisy channels and collecting data in
diverse environments pose challenges to the generalization of trained models.
Moreover, devices can engage in decentralized FL via Device-to-Device
communication while the communication topology of connected devices also
impacts the generalization of models. Most recent theoretical studies overlook
the incorporation of all these effects into FEEL when developing generalization
analyses. In contrast, our work presents an information-theoretic
generalization analysis for topology-aware FEEL in the presence of data
heterogeneity and noisy channels. Additionally, we propose a novel
regularization method called Federated Global Mutual Information Reduction
(FedGMIR) to enhance the performance of models based on our analysis. Numerical
results validate our theoretical findings and provide evidence for the
effectiveness of the proposed method.
( 2
min )
Predictive algorithms are often trained by optimizing some loss function, to
which regularization functions are added to impose a penalty for violating
constraints. As expected, the addition of such regularization functions can
change the minimizer of the objective. It is not well-understood which
regularizers change the minimizer of the loss, and, when the minimizer does
change, how it changes. We use property elicitation to take first steps towards
understanding the joint relationship between the loss and regularization
functions and the optimal decision for a given problem instance. In particular,
we give a necessary and sufficient condition on loss and regularizer pairs for
when a property changes with the addition of the regularizer, and examine some
regularizers satisfying this condition standard in the fair machine learning
literature. We empirically demonstrate how algorithmic decision-making changes
as a function of both data distribution changes and hardness of the
constraints.
( 2
min )
In the domain of music and sound processing, pitch extraction plays a pivotal
role. Our research presents a specialized convolutional neural network designed
for pitch extraction, particularly from the human singing voice in acapella
performances. Notably, our approach combines synthetic data with auto-labeled
acapella sung audio, creating a robust training environment. Evaluation across
datasets comprising synthetic sounds, opera recordings, and time-stretched
vowels demonstrates its efficacy. This work paves the way for enhanced pitch
extraction in both music and voice settings.
( 2
min )
Second-order methods for deep learning -- such as KFAC -- can be useful for
neural net training. However, they are often memory-inefficient and numerically
unstable for low-precision training since their preconditioning Kronecker
factors are dense, and require high-precision matrix inversion or
decomposition. Consequently, such methods are not widely used for training
large neural networks such as transformer-based models. We address these two
issues by (i) formulating an inverse-free update of KFAC and (ii) imposing
structures in each of the Kronecker factors, resulting in a method we term
structured inverse-free natural gradient descent (SINGD). On large modern
neural networks, we show that, in contrast to KFAC, SINGD is memory efficient
and numerically robust, and often outperforms AdamW even in half precision.
Hence, our work closes a gap between first-order and second-order methods in
modern low precision training for large neural nets.
( 2
min )
This paper considers learning the hidden causal network of a linear networked
dynamical system (NDS) from the time series data at some of its nodes --
partial observability. The dynamics of the NDS are driven by colored noise that
generates spurious associations across pairs of nodes, rendering the problem
much harder. To address the challenge of noise correlation and partial
observability, we assign to each pair of nodes a feature vector computed from
the time series data of observed nodes. The feature embedding is engineered to
yield structural consistency: there exists an affine hyperplane that
consistently partitions the set of features, separating the feature vectors
corresponding to connected pairs of nodes from those corresponding to
disconnected pairs. The causal inference problem is thus addressed via
clustering the designed features. We demonstrate with simple baseline
supervised methods the competitive performance of the proposed causal inference
mechanism under broad connectivity regimes and noise correlation levels,
including a real world network. Further, we devise novel technical guarantees
of structural consistency for linear NDS under the considered regime.
( 3
min )
We introduce ZeroSCROLLS, a zero-shot benchmark for natural language
understanding over long texts, which contains only test and small validation
sets, without training data. We adapt six tasks from the SCROLLS benchmark, and
add four new datasets, including two novel information fusing tasks, such as
aggregating the percentage of positive reviews. Using ZeroSCROLLS, we conduct a
comprehensive evaluation of both open-source and closed large language models,
finding that Claude outperforms ChatGPT, and that GPT-4 achieves the highest
average score. However, there is still room for improvement on multiple open
challenges in ZeroSCROLLS, such as aggregation tasks, where models struggle to
pass the naive baseline. As the state of the art is a moving target, we invite
researchers to evaluate their ideas on the live ZeroSCROLLS leaderboard.
( 2
min )
Artificial intelligence (AI) and machine learning (ML) present revolutionary
opportunities to enhance our understanding of animal behavior and conservation
strategies. Using elephants, a crucial species in Africa's protected areas, as
our focal point, we delve into the role of AI and ML in their conservation.
Given the increasing amounts of data gathered from a variety of sensors like
cameras, microphones, geophones, drones, and satellites, the challenge lies in
managing and interpreting this vast data. New AI and ML techniques offer
solutions to streamline this process, helping us extract vital information that
might otherwise be overlooked. This paper focuses on the different AI-driven
monitoring methods and their potential for improving elephant conservation.
Collaborative efforts between AI experts and ecological researchers are
essential in leveraging these innovative technologies for enhanced wildlife
conservation, setting a precedent for numerous other species.
( 2
min )
We present a new data driven topological data analysis (TDA) approach for
estimating state spaces in dynamically changing human functional brain networks
of human. Our approach penalizes the topological distance between networks and
clusters dynamically changing brain networks into topologically distinct
states. Our method takes into account the temporal dimension of the data
through the Wasserstein distance between networks. Our method is shown to
outperform the widely used k-means clustering often used in estimating the
state space in brain networks. The method is applied to more accurately
determine the state spaces of dynamically changing functional brain networks.
Subsequently, we address the question of whether the overall topology of brain
networks is a heritable feature using the twin study design. MATLAB code for
the method is available at https://github.com/laplcebeltrami/PH-STAT.
( 2
min )
We propose a new homotopy-based conditional gradient method for solving
convex optimization problems with a large number of simple conic constraints.
Instances of this template naturally appear in semidefinite programming
problems arising as convex relaxations of combinatorial optimization problems.
Our method is a double-loop algorithm in which the conic constraint is treated
via a self-concordant barrier, and the inner loop employs a conditional
gradient algorithm to approximate the analytic central path, while the outer
loop updates the accuracy imposed on the temporal solution and the homotopy
parameter. Our theoretical iteration complexity is competitive when confronted
to state-of-the-art SDP solvers, with the decisive advantage of cheap
projection-free subroutines. Preliminary numerical experiments are provided for
illustrating the practical performance of the method.
( 2
min )
In this paper, we introduce a novel predict-and-optimize method for
profit-driven churn prevention. We frame the task of targeting customers for a
retention campaign as a regret minimization problem. The main objective is to
leverage individual customer lifetime values (CLVs) to ensure that only the
most valuable customers are targeted. In contrast, many profit-driven
strategies focus on churn probabilities while considering average CLVs. This
often results in significant information loss due to data aggregation. Our
proposed model aligns with the guidelines of Predict-and-Optimize (PnO)
frameworks and can be efficiently solved using stochastic gradient descent
methods. Results from 12 churn prediction datasets underscore the effectiveness
of our approach, which achieves the best average performance compared to other
well-established strategies in terms of average profit.
( 2
min )
Deep Neural Networks are prone to learning spurious correlations embedded in
the training data, leading to potentially biased predictions. This poses risks
when deploying these models for high-stake decision-making, such as in medical
applications. Current methods for post-hoc model correction either require
input-level annotations which are only possible for spatially localized biases,
or augment the latent feature space, thereby hoping to enforce the right
reasons. We present a novel method for model correction on the concept level
that explicitly reduces model sensitivity towards biases via gradient
penalization. When modeling biases via Concept Activation Vectors, we highlight
the importance of choosing robust directions, as traditional regression-based
approaches such as Support Vector Machines tend to result in diverging
directions. We effectively mitigate biases in controlled and real-world
settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet
and EfficientNet architectures. Code is available on
https://github.com/frederikpahde/rrclarc.
( 2
min )
We study the problem of learning causal representations from unknown, latent
interventions in a general setting, where the latent distribution is Gaussian
but the mixing function is completely general. We prove strong identifiability
results given unknown single-node interventions, i.e., without having access to
the intervention targets. This generalizes prior works which have focused on
weaker classes, such as linear maps or paired counterfactual data. This is also
the first instance of causal identifiability from non-paired interventions for
deep neural network embeddings. Our proof relies on carefully uncovering the
high-dimensional geometric structure present in the data distribution after a
non-linear density transformation, which we capture by analyzing quadratic
forms of precision matrices of the latent distributions. Finally, we propose a
contrastive algorithm to identify the latent variables in practice and evaluate
its performance on various tasks.
( 2
min )
Neuro-Symbolic (NeSy) predictive models hold the promise of improved
compliance with given constraints, systematic generalization, and
interpretability, as they allow to infer labels that are consistent with some
prior knowledge by reasoning over high-level concepts extracted from
sub-symbolic inputs. It was recently shown that NeSy predictors are affected by
reasoning shortcuts: they can attain high accuracy but by leveraging concepts
with unintended semantics, thus coming short of their promised advantages. Yet,
a systematic characterization of reasoning shortcuts and of potential
mitigation strategies is missing. This work fills this gap by characterizing
them as unintended optima of the learning objective and identifying four key
conditions behind their occurrence. Based on this, we derive several natural
mitigation strategies, and analyze their efficacy both theoretically and
empirically. Our analysis shows reasoning shortcuts are difficult to deal with,
casting doubts on the trustworthiness and interpretability of existing NeSy
solutions.
( 2
min )
Neoteric works have shown that modern deep learning models can exhibit a
sparse double descent phenomenon. Indeed, as the sparsity of the model
increases, the test performance first worsens since the model is overfitting
the training data; then, the overfitting reduces, leading to an improvement in
performance, and finally, the model begins to forget critical information,
resulting in underfitting. Such a behavior prevents using traditional early
stop criteria. In this work, we have three key contributions. First, we propose
a learning framework that avoids such a phenomenon and improves generalization.
Second, we introduce an entropy measure providing more insights into the
insurgence of this phenomenon and enabling the use of traditional stop
criteria. Third, we provide a comprehensive quantitative analysis of contingent
factors such as re-initialization methods, model width and depth, and dataset
noise. The contributions are supported by empirical evidence in typical setups.
Our code is available at https://github.com/VGCQ/DSD2.
( 2
min )
Riemannian submanifold optimization with momentum is computationally
challenging because, to ensure that the iterates remain on the submanifold, we
often need to solve difficult differential equations. Here, we simplify such
difficulties for a class of sparse or structured symmetric positive-definite
matrices with the affine-invariant metric. We do so by proposing a generalized
version of the Riemannian normal coordinates that dynamically orthonormalizes
the metric and locally converts the problem into an unconstrained problem in
the Euclidean space. We use our approach to simplify existing approaches for
structured covariances and develop matrix-inverse-free $2^\text{nd}$-order
optimizers for deep learning with low precision by using only matrix
multiplications. Code: https://github.com/yorkerlin/StructuredNGD-DL
( 2
min )
We present CrystalBox, a novel, model-agnostic, posthoc explainability
framework for Deep Reinforcement Learning (DRL) controllers in the large family
of input-driven environments which includes computer systems. We combine the
natural decomposability of reward functions in input-driven environments with
the explanatory power of decomposed returns. We propose an efficient algorithm
to generate future-based explanations across both discrete and continuous
control environments. Using applications such as adaptive bitrate streaming and
congestion control, we demonstrate CrystalBox's capability to generate
high-fidelity explanations. We further illustrate its higher utility across
three practical use cases: contrastive explanations, network observability, and
guided reward design, as opposed to prior explainability techniques that
identify salient features.
( 2
min )
We study simple binary hypothesis testing under both local differential
privacy (LDP) and communication constraints. We qualify our results as either
minimax optimal or instance optimal: the former hold for the set of
distribution pairs with prescribed Hellinger divergence and total variation
distance, whereas the latter hold for specific distribution pairs. For the
sample complexity of simple hypothesis testing under pure LDP constraints, we
establish instance-optimal bounds for distributions with binary support;
minimax-optimal bounds for general distributions; and (approximately)
instance-optimal, computationally efficient algorithms for general
distributions. When both privacy and communication constraints are present, we
develop instance-optimal, computationally efficient algorithms that achieve the
minimum possible sample complexity (up to universal constants). Our results on
instance-optimal algorithms hinge on identifying the extreme points of the
joint range set $\mathcal A$ of two distributions $p$ and $q$, defined as
$\mathcal A := \{(\mathbf T p, \mathbf T q) | \mathbf T \in \mathcal C\}$,
where $\mathcal C$ is the set of channels characterizing the constraints.
( 2
min )
Transfer learning (TL) from pretrained deep models is a standard practice in
modern medical image classification (MIC). However, what levels of features to
be reused are problem-dependent, and uniformly finetuning all layers of
pretrained models may be suboptimal. This insight has partly motivated the
recent differential TL strategies, such as TransFusion (TF) and layer-wise
finetuning (LWFT), which treat the layers in the pretrained models
differentially. In this paper, we add one more strategy into this family,
called TruncatedTL, which reuses and finetunes appropriate bottom layers and
directly discards the remaining layers. This yields not only superior MIC
performance but also compact models for efficient inference, compared to other
differential TL methods. Our code is available at:
https://github.com/sun-umn/TTL
( 2
min )
Modeling and synthesizing real sRGB noise is crucial for various low-level
vision tasks. The distribution of real sRGB noise is highly complex and
affected by a multitude of factors, making its accurate modeling extremely
challenging. Therefore, recent studies have proposed methods that employ
data-driven generative models, such as generative adversarial networks (GAN)
and Normalizing Flows. These studies achieve more accurate modeling of sRGB
noise compared to traditional noise modeling methods. However, there are
performance limitations due to the inherent characteristics of each generative
model. To address this issue, we propose NM-FlowGAN, a hybrid approach that
exploits the strengths of both GAN and Normalizing Flows. We simultaneously
employ a pixel-wise noise modeling network based on Normalizing Flows, and
spatial correlation modeling networks based on GAN. In our experiments, our
NM-FlowGAN outperforms other baselines on the sRGB noise synthesis task.
Moreover, the denoising neural network, trained with synthesized image pairs
from our model, also shows superior performance compared to other baselines.
Our code is available at: https://github.com/YoungJooHan/NM-FlowGAN
( 2
min )
Although promising, existing defenses against query-based attacks share a
common limitation: they offer increased robustness against attacks at the price
of a considerable accuracy drop on clean samples. In this work, we show how to
efficiently establish, at test-time, a solid tradeoff between robustness and
accuracy when mitigating query-based attacks. Given that these attacks
necessarily explore low-confidence regions, our insight is that activating
dedicated defenses, such as RND (Qin et al., NeuRIPS 2021) and Random Image
Transformations (Xie et al., ICLR 2018), only for low-confidence inputs is
sufficient to prevent them. Our approach is independent of training and
supported by theory. We verify the effectiveness of our approach for various
existing defenses by conducting extensive experiments on CIFAR-10, CIFAR-100,
and ImageNet. Our results confirm that our proposal can indeed enhance these
defenses by providing better tradeoffs between robustness and accuracy when
compared to state-of-the-art approaches while being completely training-free.
( 2
min )
We introduce a new technique called Drapes to enhance the sensitivity in
searches for new physics at the LHC. By training diffusion models on side-band
data, we show how background templates for the signal region can be generated
either directly from noise, or by partially applying the diffusion process to
existing data. In the partial diffusion case, data can be drawn from side-band
regions, with the inverse diffusion performed for new target conditional
values, or from the signal region, preserving the distribution over the
conditional property that defines the signal region. We apply this technique to
the hunt for resonances using the LHCO di-jet dataset, and achieve
state-of-the-art performance for background template generation using high
level input features. We also show how Drapes can be applied to low level
inputs with jet constituents, reducing the model dependence on the choice of
input observables. Using jet constituents we can further improve sensitivity to
the signal process, but observe a loss in performance where the signal
significance before applying any selection is below 4$\sigma$.
( 2
min )
Catastrophic forgetting remains a challenge for neural networks, especially
in lifelong learning scenarios. In this study, we introduce MEtaplasticity from
Synaptic Uncertainty (MESU), inspired by metaplasticity and Bayesian inference
principles. MESU harnesses synaptic uncertainty to retain information over
time, with its update rule closely approximating the diagonal Newton's method
for synaptic updates. Through continual learning experiments on permuted MNIST
tasks, we demonstrate MESU's remarkable capability to maintain learning
performance across 100 tasks without the need of explicit task boundaries.
( 2
min )
Applications of large language models (LLMs) like ChatGPT have potential to
enhance clinical decision support through conversational interfaces. However,
challenges of human-algorithmic interaction and clinician trust are poorly
understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction
and management guidance, was deployed in clinical simulation scenarios
alongside the electronic health record (EHR) with emergency medicine
physicians, internal medicine physicians, and medical students to evaluate its
effect on physician acceptance and trust in AI clinical decision support
systems (AI-CDSS). GutGPT provides risk predictions from a validated machine
learning model and evidence-based answers by querying extracted clinical
guidelines. Participants were randomized to GutGPT and an interactive
dashboard, or the interactive dashboard and a search engine. Surveys and
educational assessments taken before and after measured technology acceptance
and content mastery. Preliminary results showed mixed effects on acceptance
after using GutGPT compared to the dashboard or search engine but appeared to
improve content mastery based on simulation performance. Overall, this study
demonstrates LLMs like GutGPT could enhance effective AI-CDSS if implemented
optimally and paired with interactive interfaces.
( 3
min )
A novel method, the Pareto Envelope Augmented with Reinforcement Learning
(PEARL), has been developed to address the challenges posed by multi-objective
problems, particularly in the field of engineering where the evaluation of
candidate solutions can be time-consuming. PEARL distinguishes itself from
traditional policy-based multi-objective Reinforcement Learning methods by
learning a single policy, eliminating the need for multiple neural networks to
independently solve simpler sub-problems. Several versions inspired from deep
learning and evolutionary techniques have been crafted, catering to both
unconstrained and constrained problem domains. Curriculum Learning is harnessed
to effectively manage constraints in these versions. PEARL's performance is
first evaluated on classical multi-objective benchmarks. Additionally, it is
tested on two practical PWR core Loading Pattern optimization problems to
showcase its real-world applicability. The first problem involves optimizing
the Cycle length and the rod-integrated peaking factor as the primary
objectives, while the second problem incorporates the mean average enrichment
as an additional objective. Furthermore, PEARL addresses three types of
constraints related to boron concentration, peak pin burnup, and peak pin
power. The results are systematically compared against a conventional approach,
the Non-dominated Sorting Genetic Algorithm. Notably, PEARL, specifically the
PEARL-NdS variant, efficiently uncovers a Pareto front without necessitating
additional efforts from the algorithm designer, as opposed to a single
optimization with scaled objectives. It also outperforms the classical approach
across multiple performance metrics, including the Hyper-volume.
( 3
min )
Car following (CF) models are fundamental to describing traffic dynamics.
However, the CF behavior of human drivers is highly stochastic and nonlinear.
As a result, identifying the best CF model has been challenging and
controversial despite decades of research. Introduction of automated vehicles
has further complicated this matter as their CF controllers remain proprietary,
though their behavior appears different than human drivers. This paper develops
a stochastic learning approach to integrate multiple CF models, rather than
relying on a single model. The framework is based on approximate Bayesian
computation that probabilistically concatenates a pool of CF models based on
their relative likelihood of describing observed behavior. The approach, while
data-driven, retains physical tractability and interpretability. Evaluation
results using two datasets show that the proposed approach can better reproduce
vehicle trajectories for both human driven and automated vehicles than any
single CF model considered.
( 2
min )
Riemannian submanifold optimization with momentum is computationally
challenging because, to ensure that the iterates remain on the submanifold, we
often need to solve difficult differential equations. Here, we simplify such
difficulties for a class of sparse or structured symmetric positive-definite
matrices with the affine-invariant metric. We do so by proposing a generalized
version of the Riemannian normal coordinates that dynamically orthonormalizes
the metric and locally converts the problem into an unconstrained problem in
the Euclidean space. We use our approach to simplify existing approaches for
structured covariances and develop matrix-inverse-free $2^\text{nd}$-order
optimizers for deep learning with low precision by using only matrix
multiplications. Code: https://github.com/yorkerlin/StructuredNGD-DL
( 2
min )
Predictive algorithms are often trained by optimizing some loss function, to
which regularization functions are added to impose a penalty for violating
constraints. As expected, the addition of such regularization functions can
change the minimizer of the objective. It is not well-understood which
regularizers change the minimizer of the loss, and, when the minimizer does
change, how it changes. We use property elicitation to take first steps towards
understanding the joint relationship between the loss and regularization
functions and the optimal decision for a given problem instance. In particular,
we give a necessary and sufficient condition on loss and regularizer pairs for
when a property changes with the addition of the regularizer, and examine some
regularizers satisfying this condition standard in the fair machine learning
literature. We empirically demonstrate how algorithmic decision-making changes
as a function of both data distribution changes and hardness of the
constraints.
( 2
min )
Neuro-Symbolic (NeSy) predictive models hold the promise of improved
compliance with given constraints, systematic generalization, and
interpretability, as they allow to infer labels that are consistent with some
prior knowledge by reasoning over high-level concepts extracted from
sub-symbolic inputs. It was recently shown that NeSy predictors are affected by
reasoning shortcuts: they can attain high accuracy but by leveraging concepts
with unintended semantics, thus coming short of their promised advantages. Yet,
a systematic characterization of reasoning shortcuts and of potential
mitigation strategies is missing. This work fills this gap by characterizing
them as unintended optima of the learning objective and identifying four key
conditions behind their occurrence. Based on this, we derive several natural
mitigation strategies, and analyze their efficacy both theoretically and
empirically. Our analysis shows reasoning shortcuts are difficult to deal with,
casting doubts on the trustworthiness and interpretability of existing NeSy
solutions.
( 2
min )
We introduce ZeroSCROLLS, a zero-shot benchmark for natural language
understanding over long texts, which contains only test and small validation
sets, without training data. We adapt six tasks from the SCROLLS benchmark, and
add four new datasets, including two novel information fusing tasks, such as
aggregating the percentage of positive reviews. Using ZeroSCROLLS, we conduct a
comprehensive evaluation of both open-source and closed large language models,
finding that Claude outperforms ChatGPT, and that GPT-4 achieves the highest
average score. However, there is still room for improvement on multiple open
challenges in ZeroSCROLLS, such as aggregation tasks, where models struggle to
pass the naive baseline. As the state of the art is a moving target, we invite
researchers to evaluate their ideas on the live ZeroSCROLLS leaderboard.
( 2
min )
We revisit the general framework introduced by Fazylab et al. (SIAM J. Optim.
28, 2018) to construct Lyapunov functions for optimization algorithms in
discrete and continuous time. For smooth, strongly convex objective functions,
we relax the requirements necessary for such a construction. As a result we are
able to prove for Polyak's ordinary differential equations and for a
two-parameter family of Nesterov algorithms rates of convergence that improve
on those available in the literature. We analyse the interpretation of Nesterov
algorithms as discretizations of the Polyak equation. We show that the
algorithms are instances of Additive Runge-Kutta integrators and discuss the
reasons why most discretizations of the differential equation do not result in
optimization algorithms with acceleration. We also introduce a modification of
Polyak's equation and study its convergence properties. Finally we extend the
general framework to the stochastic scenario and consider an application to
random algorithms with acceleration for overparameterized models; again we are
able to prove convergence rates that improve on those in the literature.
( 2
min )
In this paper, we provide a geometric interpretation of the structure of Deep
Learning (DL) networks, characterized by $L$ hidden layers, a ReLU ramp
activation function, an $\mathcal{L}^2$ Schatten class (or Hilbert-Schmidt)
cost function, and input and output spaces $\mathbb{R}^Q$ with equal dimension
$Q\geq1$. The hidden layers are also defined on $\mathbb{R}^{Q}$; the training
input size $N$ can be arbitrarily large - thus, we are considering the
underparametrized regime. We apply our recent results on shallow neural
networks to construct an explicit family of minimizers for the global minimum
of the cost function in the case $L\geq Q$, which we show to be degenerate. In
the context presented here, the hidden layers of the DL network "curate" the
training inputs by recursive application of a truncation map that minimizes the
noise to signal ratio of the training inputs. Moreover, we determine a set of
$2^Q-1$ distinct degenerate local minima of the cost function. Our
constructions make no use of gradient descent algorithms at all.
( 3
min )
We study the problem of learning causal representations from unknown, latent
interventions in a general setting, where the latent distribution is Gaussian
but the mixing function is completely general. We prove strong identifiability
results given unknown single-node interventions, i.e., without having access to
the intervention targets. This generalizes prior works which have focused on
weaker classes, such as linear maps or paired counterfactual data. This is also
the first instance of causal identifiability from non-paired interventions for
deep neural network embeddings. Our proof relies on carefully uncovering the
high-dimensional geometric structure present in the data distribution after a
non-linear density transformation, which we capture by analyzing quadratic
forms of precision matrices of the latent distributions. Finally, we propose a
contrastive algorithm to identify the latent variables in practice and evaluate
its performance on various tasks.
( 2
min )
In this paper, we interpret disentanglement as the discovery of local charts
of the data manifold and trace how this definition naturally leads to an
equivalent condition for disentanglement: commutativity between factors of
variation. We study the impact of this manifold framework to two classes of
problems: learning matrix exponential operators and compressing data-generating
models. In each problem, the manifold perspective yields interesting results
about the feasibility and fruitful approaches their solutions. We also link our
manifold framework to two other common disentanglement paradigms: group
theoretic and probabilistic approaches to disentanglement. In each case, we
show how these frameworks can be merged with our manifold perspective.
Importantly, we recover commutativity as a central property in both alternative
frameworks, further highlighting its importance in disentanglement.
( 2
min )
We study simple binary hypothesis testing under both local differential
privacy (LDP) and communication constraints. We qualify our results as either
minimax optimal or instance optimal: the former hold for the set of
distribution pairs with prescribed Hellinger divergence and total variation
distance, whereas the latter hold for specific distribution pairs. For the
sample complexity of simple hypothesis testing under pure LDP constraints, we
establish instance-optimal bounds for distributions with binary support;
minimax-optimal bounds for general distributions; and (approximately)
instance-optimal, computationally efficient algorithms for general
distributions. When both privacy and communication constraints are present, we
develop instance-optimal, computationally efficient algorithms that achieve the
minimum possible sample complexity (up to universal constants). Our results on
instance-optimal algorithms hinge on identifying the extreme points of the
joint range set $\mathcal A$ of two distributions $p$ and $q$, defined as
$\mathcal A := \{(\mathbf T p, \mathbf T q) | \mathbf T \in \mathcal C\}$,
where $\mathcal C$ is the set of channels characterizing the constraints.
( 2
min )
In modern federated learning, one of the main challenges is to account for
inherent heterogeneity and the diverse nature of data distributions for
different clients. This problem is often addressed by introducing
personalization of the models towards the data distribution of the particular
client. However, a personalized model might be unreliable when applied to the
data that is not typical for this client. Eventually, it may perform worse for
these data than the non-personalized global model trained in a federated way on
the data from all the clients. This paper presents a new approach to federated
learning that allows selecting a model from global and personalized ones that
would perform better for a particular input point. It is achieved through a
careful modeling of predictive uncertainties that helps to detect local and
global in- and out-of-distribution data and use this information to select the
model that is confident in a prediction. The comprehensive experimental
evaluation on the popular real-world image datasets shows the superior
performance of the model in the presence of out-of-distribution data while
performing on par with state-of-the-art personalized federated learning
algorithms in the standard scenarios.
( 2
min )
In this paper, we explore the capability of both the Adjacency Spectral
Embedding (ASE) and the Graph Encoder Embedding (GEE) for capturing an embedded
pseudo-clique structure in the random dot product graph setting. In both theory
and experiments, we demonstrate that this pairing of model and methods can
yield worse results than the best existing spectral clique detection methods,
demonstrating at once the methods' potential inability to capture even modestly
sized pseudo-cliques and the methods' robustness to the model contamination
giving rise to the pseudo-clique structure. To further enrich our analysis, we
also consider the Variational Graph Auto-Encoder (VGAE) model in our simulation
and real data experiments.
( 2
min )
Block majorization-minimization (BMM) is a simple iterative algorithm for
nonconvex optimization that sequentially minimizes a majorizing surrogate of
the objective function in each block coordinate while the other block
coordinates are held fixed. We consider a family of BMM algorithms for
minimizing smooth nonconvex objectives, where each parameter block is
constrained within a subset of a Riemannian manifold. We establish that this
algorithm converges asymptotically to the set of stationary points, and attains
an $\epsilon$-stationary point within $\widetilde{O}(\epsilon^{-2})$
iterations. In particular, the assumptions for our complexity results are
completely Euclidean when the underlying manifold is a product of Euclidean or
Stiefel manifolds, although our analysis makes explicit use of the Riemannian
geometry. Our general analysis applies to a wide range of algorithms with
Riemannian constraints: Riemannian MM, block projected gradient descent,
optimistic likelihood estimation, geodesically constrained subspace tracking,
robust PCA, and Riemannian CP-dictionary-learning. We experimentally validate
that our algorithm converges faster than standard Euclidean algorithms applied
to the Riemannian setting.
( 2
min )
Neural networks are powerful tools in various applications, and quantifying
their uncertainty is crucial for reliable decision-making. In the deep learning
field, the uncertainties are usually categorized into aleatoric (data) and
epistemic (model) uncertainty. In this paper, we point out that the existing
popular variance attenuation method highly overestimates aleatoric uncertainty.
To address this issue, we propose a new estimation method by actively
de-noising the observed data \footnote{Source code available at
\url{https://github.com/wz16/DVA}.}. By conducting a broad range of
experiments, we demonstrate that our proposed approach provides a much closer
approximation to the actual data uncertainty than the standard method.
( 2
min )
Current deep learning algorithms designed for automatic ECG analysis have
exhibited notable accuracy. However, akin to traditional electrocardiography,
they tend to be narrowly focused and typically address a singular diagnostic
condition. In this study, we specifically demonstrate the capability of a
single model to predict a diverse range of both cardiac and non-cardiac
discharge diagnoses based on a sole ECG collected in the emergency department.
Among the 1,076 hierarchically structured ICD codes considered, our model
achieves an AUROC exceeding 0.8 in 439 of them. This underscores the models
proficiency in handling a wide array of diagnostic scenarios. We emphasize the
potential of utilizing this model as a screening tool, potentially integrated
into a holistic clinical decision support system for efficiently triaging
patients in the emergency department. This research underscores the remarkable
capabilities of comprehensive ECG analysis algorithms and the extensive range
of possibilities facilitated by the open MIMIC-IV-ECG dataset. Finally, our
data may play a pivotal role in revolutionizing the way ECG analysis is
performed, marking a significant advancement in the field.
( 2
min )
The applications of traditional statistical feature selection methods to
high-dimension, low sample-size data often struggle and encounter challenging
problems, such as overfitting, curse of dimensionality, computational
infeasibility, and strong model assumption. In this paper, we propose a novel
two-step nonparametric approach called Deep Feature Screening (DeepFS) that can
overcome these problems and identify significant features with high precision
for ultra high-dimensional, low-sample-size data. This approach first extracts
a low-dimensional representation of input data and then applies feature
screening based on multivariate rank distance correlation recently developed by
Deb and Sen (2021). This approach combines the strengths of both deep neural
networks and feature screening, and thereby has the following appealing
features in addition to its ability of handling ultra high-dimensional data
with small number of samples: (1) it is model free and distribution free; (2)
it can be used for both supervised and unsupervised feature selection; and (3)
it is capable of recovering the original input data. The superiority of DeepFS
is demonstrated via extensive simulation studies and real data analyses.
( 2
min )
Random Forest is a machine learning method that offers many advantages,
including the ability to easily measure variable importance. Class balancing
technique is a well-known solution to deal with class imbalance problem.
However, it has not been actively studied on RF variable importance. In this
paper, we study the effect of class balancing on RF variable importance. Our
simulation results show that over-sampling is effective in correctly measuring
variable importance in class imbalanced situations with small sample size,
while under-sampling fails to differentiate important and non-informative
variables. We then propose a variable selection algorithm that utilizes RF
variable importance and its confidence interval. Through an experimental study
using many real and artificial datasets, we demonstrate that our proposed
algorithm efficiently selects an optimal feature set, leading to improved
prediction performance in class imbalance problem.
( 2
min )
We study hypothesis testing under communication constraints, where each
sample is quantized before being revealed to a statistician. Without
communication constraints, it is well known that the sample complexity of
simple binary hypothesis testing is characterized by the Hellinger distance
between the distributions. We show that the sample complexity of simple binary
hypothesis testing under communication constraints is at most a logarithmic
factor larger than in the unconstrained setting and this bound is tight. We
develop a polynomial-time algorithm that achieves the aforementioned sample
complexity. Our framework extends to robust hypothesis testing, where the
distributions are corrupted in the total variation distance. Our proofs rely on
a new reverse data processing inequality and a reverse Markov inequality, which
may be of independent interest. For simple $M$-ary hypothesis testing, the
sample complexity in the absence of communication constraints has a logarithmic
dependence on $M$. We show that communication constraints can cause an
exponential blow-up leading to $\Omega(M)$ sample complexity even for adaptive
algorithms.
( 2
min )
We consider the problem of inferring latent stochastic differential equations
(SDEs) with a time and memory cost that scales independently with the amount of
data, the total length of the time series, and the stiffness of the approximate
differential equations. This is in stark contrast to typical methods for
inferring latent differential equations which, despite their constant memory
cost, have a time complexity that is heavily dependent on the stiffness of the
approximate differential equation. We achieve this computational advancement by
removing the need to solve differential equations when approximating gradients
using a novel amortization strategy coupled with a recently derived
reparametrization of expectations under linear SDEs. We show that, in practice,
this allows us to achieve similar performance to methods based on adjoint
sensitivities with more than an order of magnitude fewer evaluations of the
model in training.
( 2
min )
This paper studies the theoretical framework of the alignment process of
generative models with Reinforcement Learning from Human Feedback (RLHF). We
consider a standard mathematical formulation, the reverse-KL regularized
contextual bandit for RLHF. Despite its widespread practical application, a
rigorous theoretical analysis of this formulation remains open. We investigate
its theoretical properties both in offline and online settings and propose
efficient algorithms with finite-sample theoretical guarantees. Our work
bridges the gap between theory and practice by linking our theoretical insights
with existing practical alignment algorithms such as Direct Preference
Optimization (DPO) and Rejection Sampling Optimization (RSO). Furthermore,
these findings and connections also offer both theoretical and practical
communities new tools and insights for future algorithmic design of alignment
algorithms.
( 2
min )
We derive a concentration bound of the type `for all $n \geq n_0$ for some
$n_0$' for TD(0) with linear function approximation. We work with online TD
learning with samples from a single sample path of the underlying Markov chain.
This makes our analysis significantly different from offline TD learning or TD
learning with access to independent samples from the stationary distribution of
the Markov chain. We treat TD(0) as a contractive stochastic approximation
algorithm, with both martingale and Markov noises. Markov noise is handled
using the Poisson equation and the lack of almost sure guarantees on
boundedness of iterates is handled using the concept of relaxed concentration
inequalities.
( 2
min )
In the lead-up to next month’s CES trade show in Las Vegas, NVIDIA will unveil its latest advancements in artificial intelligence — including generative AI — and a spectrum of other cutting-edge technologies. Scheduled for Monday, Jan. 8, at 8 a.m. PT, the company’s special address will be publicly streamed. Save the date and plan Read article >
( 5
min )
NVIDIA DLSS 3.5 for realistic ray-traced visuals is now available on D5 Render, a real-time 3D creation software.
( 7
min )
This post was written in collaboration with Ankur Goyal and Karthikeyan Chokappa from PwC Australia’s Cloud & Digital business. Artificial intelligence (AI) and machine learning (ML) are becoming an integral part of systems and processes, enabling decisions in real time, thereby driving top and bottom-line improvements across organizations. However, putting an ML model into production […]
( 10
min )
Dementia diagnosis requires a series of different testing methods, which is
complex and time-consuming. Early detection of dementia is crucial as it can
prevent further deterioration of the condition. This paper utilizes a speech
recognition model to construct a dementia assessment system tailored for
Mandarin speakers during the picture description task. By training an
attention-based speech recognition model on voice data closely resembling
real-world scenarios, we have significantly enhanced the model's recognition
capabilities. Subsequently, we extracted the encoder from the speech
recognition model and added a linear layer for dementia assessment. We
collected Mandarin speech data from 99 subjects and acquired their clinical
assessments from a local hospital. We achieved an accuracy of 92.04% in
Alzheimer's disease detection and a mean absolute error of 9% in clinical
dementia rating score prediction.
( 2
min )
One of the challenges in deploying a machine learning model is that the
model's performance degrades as the operating environment changes. To maintain
the performance, streaming active learning is used, in which the model is
retrained by adding a newly annotated sample to the training dataset if the
prediction of the sample is not certain enough. Although many streaming active
learning methods have been proposed for classification, few efforts have been
made for regression problems, which are often handled in the industrial field.
In this paper, we propose to use the regression-via-classification framework
for streaming active learning for regression. Regression-via-classification
transforms regression problems into classification problems so that streaming
active learning methods proposed for classification problems can be applied
directly to regression problems. Experimental validation on four real data sets
shows that the proposed method can perform regression with higher accuracy at
the same annotation cost.
( 2
min )
A common approach to learning mobile health (mHealth) intervention policies
is linear Thompson sampling. Two desirable mHealth policy features are (1)
pooling information across individuals and time and (2) incorporating a
time-varying baseline reward. Previous approaches pooled information across
individuals but not time, failing to capture trends in treatment effects over
time. In addition, these approaches did not explicitly model the baseline
reward, which limited the ability to precisely estimate the parameters in the
differential reward model. In this paper, we propose a novel Thompson sampling
algorithm, termed ''DML-TS-NNR'' that leverages (1) nearest-neighbors to
efficiently pool information on the differential reward function across users
and time and (2) the Double Machine Learning (DML) framework to explicitly
model baseline rewards and stay agnostic to the supervised learning algorithms
used. By explicitly modeling baseline rewards, we obtain smaller confidence
sets for the differential reward parameters. We offer theoretical guarantees on
the pseudo-regret, which are supported by empirical results. Importantly, the
DML-TS-NNR algorithm demonstrates robustness to potential misspecifications in
the baseline reward model.
( 2
min )
The recognition of abstracts is crucial for effectively locating the content
and clarifying the article. Existing move recognition algorithms lack the
ability to learn word position information to obtain contextual semantics. This
paper proposes a novel enhanced move recognition algorithm with an improved
pre-trained model and a gated network with attention mechanism for unstructured
abstracts of Chinese scientific and technological papers. The proposed
algorithm first performs summary data segmentation and vocabulary training. The
EP-ERNIE$\_$AT-GRU framework is leveraged to incorporate word positional
information, facilitating deep semantic learning and targeted feature
extraction. Experimental results demonstrate that the proposed algorithm
achieves 13.37$\%$ higher accuracy on the split dataset than on the original
dataset and a 7.55$\%$ improvement in accuracy over the basic comparison model.
( 2
min )
While federated learning is promising for privacy-preserving collaborative
learning without revealing local data, it remains vulnerable to white-box
attacks and struggles to adapt to heterogeneous clients. Federated distillation
(FD), built upon knowledge distillation--an effective technique for
transferring knowledge from a teacher model to student models--emerges as an
alternative paradigm, which provides enhanced privacy guarantees and addresses
model heterogeneity. Nevertheless, challenges arise due to variations in local
data distributions and the absence of a well-trained teacher model, which leads
to misleading and ambiguous knowledge sharing that significantly degrades model
performance. To address these issues, this paper proposes a selective knowledge
sharing mechanism for FD, termed Selective-FD. It includes client-side
selectors and a server-side selector to accurately and precisely identify
knowledge from local and ensemble predictions, respectively. Empirical studies,
backed by theoretical insights, demonstrate that our approach enhances the
generalization capabilities of the FD framework and consistently outperforms
baseline methods.
( 2
min )
The influx of massive amounts of data from current and upcoming cosmological
surveys necessitates compression schemes that can efficiently summarize the
data with minimal loss of information. We introduce a method that leverages the
paradigm of self-supervised machine learning in a novel manner to construct
representative summaries of massive datasets using simulation-based
augmentations. Deploying the method on hydrodynamical cosmological simulations,
we show that it can deliver highly informative summaries, which can be used for
a variety of downstream tasks, including precise and accurate parameter
inference. We demonstrate how this paradigm can be used to construct summary
representations that are insensitive to prescribed systematic effects, such as
the influence of baryonic physics. Our results indicate that self-supervised
machine learning techniques offer a promising new approach for compression of
cosmological data as well its analysis.
( 2
min )
Many functions characterising physical systems are additively separable. This
is the case, for instance, of mechanical Hamiltonian functions in physics,
population growth equations in biology, and consumer preference and utility
functions in economics. We consider the scenario in which a surrogate of a
function is to be tested for additive separability. The detection that the
surrogate is additively separable can be leveraged to improve further learning.
Hence, it is beneficial to have the ability to test for such separability in
surrogates. The mathematical approach is to test if the mixed partial
derivative of the surrogate is zero; or empirically, lower than a threshold. We
present and comparatively and empirically evaluate the eight methods to compute
the mixed partial derivative of a surrogate function.
( 2
min )
While coresets have been growing in terms of their application, barring few
exceptions, they have mostly been limited to unsupervised settings. We consider
supervised classification problems, and non-decomposable evaluation measures in
such settings. We show that stratified uniform sampling based coresets have
excellent empirical performance that are backed by theoretical guarantees too.
We focus on the F1 score and Matthews Correlation Coefficient, two widely used
non-decomposable objective functions that are nontrivial to optimize for and
show that uniform coresets attain a lower bound for coreset size, and have good
empirical performance, comparable with ``smarter'' coreset construction
strategies.
( 2
min )
Theoretical guarantees in reinforcement learning (RL) are known to suffer
multiplicative blow-up factors with respect to the misspecification error of
function approximation. Yet, the nature of such \emph{approximation factors} --
especially their optimal form in a given learning problem -- is poorly
understood. In this paper we study this question in linear off-policy value
function estimation, where many open questions remain. We study the
approximation factor in a broad spectrum of settings, such as with the weighted
$L_2$-norm (where the weighting is the offline state distribution), the
$L_\infty$ norm, the presence vs. absence of state aliasing, and full vs.
partial coverage of the state space. We establish the optimal asymptotic
approximation factors (up to constants) for all of these settings. In
particular, our bounds identify two instance-dependent factors for the
$L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to
dictate the hardness of off-policy evaluation under misspecification.
( 2
min )
High-resolution image generation with Generative Artificial Intelligence
(GenAI) has immense potential but, due to the enormous capital investment
required for training, it is increasingly centralised to a few large
corporations, and hidden behind paywalls. This paper aims to democratise
high-resolution GenAI by advancing the frontier of high-resolution generation
while remaining accessible to a broad audience. We demonstrate that existing
Latent Diffusion Models (LDMs) possess untapped potential for higher-resolution
image generation. Our novel DemoFusion framework seamlessly extends open-source
GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated
Sampling mechanisms to achieve higher-resolution image generation. The
progressive nature of DemoFusion requires more passes, but the intermediate
results can serve as "previews", facilitating rapid prompt iteration.
( 2
min )
We study monotone submodular maximization under general matroid constraints
in the online setting. We prove that online optimization of a large class of
submodular functions, namely, weighted threshold potential functions, reduces
to online convex optimization (OCO). This is precisely because functions in
this class admit a concave relaxation; as a result, OCO policies, coupled with
an appropriate rounding scheme, can be used to achieve sublinear regret in the
combinatorial setting. We show that our reduction extends to many different
versions of the online learning problem, including the dynamic regret, bandit,
and optimistic-learning settings.
( 2
min )
The aim of this paper is to provide a theoretically founded investigation of
state-of-the-art learning approaches for inverse problems. We give an extended
definition of regularization methods and their convergence in terms of the
underlying data distributions, which paves the way for future theoretical
studies. Based on a simple spectral learning model previously introduced for
supervised learning, we investigate some key properties of different learning
paradigms for inverse problems, which can be formulated independently of
specific architectures. In particular we investigate the regularization
properties, bias, and critical dependence on training data distributions.
Moreover, our framework allows to highlight and compare the specific behavior
of the different paradigms in the infinite-dimensional limit.
( 2
min )
In this work we introduce $\nu^2$-Flows, an extension of the $\nu$-Flows
method to final states containing multiple neutrinos. The architecture can
natively scale for all combinations of object types and multiplicities in the
final state for any desired neutrino multiplicities. In $t\bar{t}$ dilepton
events, the momenta of both neutrinos and correlations between them are
reconstructed more accurately than when using the most popular standard
analytical techniques, and solutions are found for all events. Inference time
is significantly faster than competing methods, and can be reduced further by
evaluating in parallel on graphics processing units. We apply $\nu^2$-Flows to
$t\bar{t}$ dilepton events and show that the per-bin uncertainties in unfolded
distributions is much closer to the limit of performance set by perfect
neutrino reconstruction than standard techniques. For the chosen double
differential observables $\nu^2$-Flows results in improved statistical
precision for each bin by a factor of 1.5 to 2 in comparison to the Neutrino
Weighting method and up to a factor of four in comparison to the Ellipse
approach.
( 3
min )
The community explored to build private inference frameworks for
transformer-based large language models (LLMs) in a server-client setting,
where the server holds the model parameters and the client inputs its private
data (or prompt) for inference. However, these frameworks impose significant
overhead when the private inputs are forward propagated through the original
LLMs. In this paper, we show that substituting the computation- and
communication-heavy operators in the transformer architecture with
privacy-computing friendly approximations can greatly reduce the private
inference costs while incurring very minor impact on model performance.
Compared to state-of-the-art Iron (NeurIPS 2022), our privacy-computing
friendly model inference pipeline achieves a $5\times$ acceleration in
computation and an 80% reduction in communication overhead, while retaining
nearly identical accuracy.
( 2
min )
In the field of clinical medicine, computed tomography (CT) is an effective
medical imaging modality for the diagnosis of various pathologies. Compared
with X-ray images, CT images can provide more information, including
multi-planar slices and three-dimensional structures for clinical diagnosis.
However, CT imaging requires patients to be exposed to large doses of ionizing
radiation for a long time, which may cause irreversible physical harm. In this
paper, we propose an Uncertainty-aware MedNeRF (UMedNeRF) network based on
generated radiation fields. The network can learn a continuous representation
of CT projections from 2D X-ray images by obtaining the internal structure and
depth information and using adaptive loss weights to ensure the quality of the
generated images. Our model is trained on publicly available knee and chest
datasets, and we show the results of CT projection rendering with a single
X-ray and compare our method with other methods based on generated radiation
fields.
( 2
min )
Biomedical entity linking (BioEL) has achieved remarkable progress with the
help of pre-trained language models. However, existing BioEL methods usually
struggle to handle rare and difficult entities due to long-tailed distribution.
To address this limitation, we introduce a new scheme $k$NN-BioEL, which
provides a BioEL model with the ability to reference similar instances from the
entire training corpus as clues for prediction, thus improving the
generalization capabilities. Moreover, we design a contrastive learning
objective with dynamic hard negative sampling (DHNS) that improves the quality
of the retrieved neighbors during inference. Extensive experimental results
show that $k$NN-BioEL outperforms state-of-the-art baselines on several
datasets.
( 2
min )
We present a deep Graph Convolutional Kernel Machine (GCKM) for
semi-supervised node classification in graphs. The method is built of two main
types of blocks: (i) We introduce unsupervised kernel machine layers
propagating the node features in a one-hop neighborhood, using implicit node
feature mappings. (ii) We specify a semi-supervised classification kernel
machine through the lens of the Fenchel-Young inequality. We derive an
effective initialization scheme and efficient end-to-end training algorithm in
the dual variables for the full architecture. The main idea underlying GCKM is
that, because of the unsupervised core, the final model can achieve higher
performance in semi-supervised node classification when few labels are
available for training. Experimental results demonstrate the effectiveness of
the proposed framework.
( 2
min )
Inverse reinforcement learning (IRL) is computationally challenging, with
common approaches requiring the solution of multiple reinforcement learning
(RL) sub-problems. This work motivates the use of potential-based reward
shaping to reduce the computational burden of each RL sub-problem. This work
serves as a proof-of-concept and we hope will inspire future developments
towards computationally efficient IRL.
( 2
min )
The promise of Mobile Health (mHealth) is the ability to use wearable sensors
to monitor participant physiology at high frequencies during daily life to
enable temporally-precise health interventions. However, a major challenge is
frequent missing data. Despite a rich imputation literature, existing
techniques are ineffective for the pulsative signals which comprise many
mHealth applications, and a lack of available datasets has stymied progress. We
address this gap with PulseImpute, the first large-scale pulsative signal
imputation challenge which includes realistic mHealth missingness models, an
extensive set of baselines, and clinically-relevant downstream tasks. Our
baseline models include a novel transformer-based architecture designed to
exploit the structure of pulsative signals. We hope that PulseImpute will
enable the ML community to tackle this significant and challenging task.
( 2
min )
Can a machine or algorithm discover or learn Kepler's first law from
astronomical sightings alone? We emulate Johannes Kepler's discovery of the
equation of the orbit of Mars with the Rudolphine tables using AI Feynman, a
physics-inspired tool for symbolic regression.
( 2
min )
Exact Bayesian inference on state-space models~(SSM) is in general
untractable, and unfortunately, basic Sequential Monte Carlo~(SMC) methods do
not yield correct approximations for complex models. In this paper, we propose
a mixed inference algorithm that computes closed-form solutions using belief
propagation as much as possible, and falls back to sampling-based SMC methods
when exact computations fail. This algorithm thus implements automatic
Rao-Blackwellization and is even exact for Gaussian tree models.
( 2
min )
Policy learning in robot-assisted surgery (RAS) lacks data efficient and
versatile methods that exhibit the desired motion quality for delicate surgical
interventions. To this end, we introduce Movement Primitive Diffusion (MPD), a
novel method for imitation learning (IL) in RAS that focuses on gentle
manipulation of deformable objects. The approach combines the versatility of
diffusion-based imitation learning (DIL) with the high-quality motion
generation capabilities of Probabilistic Dynamic Movement Primitives (ProDMPs).
This combination enables MPD to achieve gentle manipulation of deformable
objects, while maintaining data efficiency critical for RAS applications where
demonstration data is scarce. We evaluate MPD across various simulated tasks
and a real world robotic setup on both state and image observations. MPD
outperforms state-of-the-art DIL methods in success rate, motion quality, and
data efficiency.
( 2
min )
Venn Prediction (VP) is a new machine learning framework for producing
well-calibrated probabilistic predictions. In particular it provides
well-calibrated lower and upper bounds for the conditional probability of an
example belonging to each possible class of the problem at hand. This paper
proposes five VP methods based on Neural Networks (NNs), which is one of the
most widely used machine learning techniques. The proposed methods are
evaluated experimentally on four benchmark datasets and the obtained results
demonstrate the empirical well-calibratedness of their outputs and their
superiority over the outputs of the traditional NN classifier.
( 2
min )
Artificial Intelligence (AI) based image analysis has an immense potential to
support diagnostic histopathology, including cancer diagnostics. However,
developing supervised AI methods requires large-scale annotated datasets. A
potentially powerful solution is to augment training data with synthetic data.
Latent diffusion models, which can generate high-quality, diverse synthetic
images, are promising. However, the most common implementations rely on
detailed textual descriptions, which are not generally available in this
domain. This work proposes a method that constructs structured textual prompts
from automatically extracted image features. We experiment with the PCam
dataset, composed of tissue patches only loosely annotated as healthy or
cancerous. We show that including image-derived features in the prompt, as
opposed to only healthy and cancerous labels, improves the Fr\'echet Inception
Distance (FID) from 178.8 to 90.2. We also show that pathologists find it
challenging to detect synthetic images, with a median sensitivity/specificity
of 0.55/0.55. Finally, we show that synthetic data effectively trains AI
models.
( 3
min )
Offline reinforcement learning leverages pre-collected datasets of
transitions to train policies. It can serve as effective initialization for
online algorithms, enhancing sample efficiency and speeding up convergence.
However, when such datasets are limited in size and quality, offline
pre-training can produce sub-optimal policies and lead to degraded online
reinforcement learning performance. In this paper we propose a model-based data
augmentation strategy to maximize the benefits of offline reinforcement
learning pre-training and reduce the scale of data needed to be effective. Our
approach leverages a world model of the environment trained on the offline
dataset to augment states during offline pre-training. We evaluate our approach
on a variety of MuJoCo robotic tasks and our results show it can jump-start
online fine-tuning and substantially reduce - in some cases by an order of
magnitude - the required number of environment interactions.
( 2
min )
This paper studies the problem of CPRP, concept prerequisite relation
prediction, which is a fundamental task in using AI for education. CPRP is
usually formulated into a link-prediction task on a relationship graph of
concepts and solved by training the graph neural network (GNN) model. However,
current directed GNNs fail to manage graph isomorphism which refers to the
invariance of non-isomorphic graphs, reducing the expressivity of resulting
representations. We present a permutation-equivariant directed GNN model by
introducing the Weisfeiler-Lehman test into directed GNN learning. Our method
is then used for CPRP and evaluated on three public datasets. The experimental
results show that our model delivers better prediction performance than the
state-of-the-art methods.
( 2
min )
In this paper we propose a new method for training neural networks (NNs) for
frequency modulated continuous wave (FMCW) radar mutual interference
mitigation. Instead of training NNs to regress from interfered to clean radar
signals as in previous work, we train NNs directly on object detection maps. We
do so by performing a continuous relaxation of the cell-averaging constant
false alarm rate (CA-CFAR) peak detector, which is a well-established algorithm
for object detection using radar. With this new training objective we are able
to increase object detection performance by a large margin. Furthermore, we
introduce separable convolution kernels to strongly reduce the number of
parameters and computational complexity of convolutional NN architectures for
radar applications. We validate our contributions with experiments on
real-world measurement data and compare them against signal processing
interference mitigation methods.
( 2
min )
This paper presents a method for learning Hamiltonian dynamics from a limited
set of data points. The Hamiltonian vector field is found by regularized
optimization over a reproducing kernel Hilbert space of vector fields that are
inherently Hamiltonian, and where the vector field is required to be odd or
even. This is done with a symplectic kernel, and it is shown how this
symplectic kernel can be modified to be odd or even. The performance of the
method is validated in simulations for two Hamiltonian systems. It is shown
that the learned dynamics are Hamiltonian, and that the learned Hamiltonian
vector field can be prescribed to be odd or even.
( 2
min )
Congenital heart disease (CHD) is a relatively rare disease that affects
patients at birth and results in extremely heterogeneous anatomical and
functional defects. 12-lead ECG signal is routinely collected in CHD patients
because it provides significant biomarkers for disease prognosis. However,
developing accurate machine learning models is challenging due to the lack of
large available datasets. Here, we suggest exploiting the Riemannian geometry
of the spatial covariance structure of the ECG signal to improve
classification. Firstly, we use covariance augmentation to mix samples across
the Riemannian geodesic between corresponding classes. Secondly, we suggest to
project the covariance matrices to their respective class Riemannian mean to
enhance the quality of feature extraction via tangent space projection. We
perform several ablation experiments and demonstrate significant improvement
compared to traditional machine learning models and deep learning on ECG time
series data.
( 2
min )
Despite being a unique source of information on patients' status and disease
progression, clinical notes are characterized by high levels of duplication and
information redundancy. In general domain text, it has been shown that
deduplication does not harm language model (LM) pretraining, thus helping
reduce the training cost. Although large LMs have proven to learn medical
knowledge, they still require specialized domain adaptation for improved
downstream clinical tasks. By leveraging large real-world clinical corpora, we
first provided a fine-grained characterization of duplicates stemming from
common writing practices and clinical relevancy. Second, we demonstrated that
deduplicating clinical text can help clinical LMs encode less redundant
information in a more efficient manner and do not harm classification tasks via
prompt-based learning.
( 2
min )
Binary code summarization, while invaluable for understanding code semantics,
is challenging due to its labor-intensive nature. This study delves into the
potential of large language models (LLMs) for binary code comprehension. To
this end, we present BinSum, a comprehensive benchmark and dataset of over 557K
binary functions and introduce a novel method for prompt synthesis and
optimization. To more accurately gauge LLM performance, we also propose a new
semantic similarity metric that surpasses traditional exact-match approaches.
Our extensive evaluation of prominent LLMs, including ChatGPT, GPT-4, Llama 2,
and Code Llama, reveals 10 pivotal insights. This evaluation generates 4
billion inference tokens, incurred a total expense of 11,418 US dollars and 873
NVIDIA A100 GPU hours. Our findings highlight both the transformative potential
of LLMs in this field and the challenges yet to be overcome.
( 2
min )
Despite the remarkable advances in deep learning technology, achieving
satisfactory performance in lung sound classification remains a challenge due
to the scarcity of available data. Moreover, the respiratory sound samples are
collected from a variety of electronic stethoscopes, which could potentially
introduce biases into the trained models. When a significant distribution shift
occurs within the test dataset or in a practical scenario, it can substantially
decrease the performance. To tackle this issue, we introduce cross-domain
adaptation techniques, which transfer the knowledge from a source domain to a
distinct target domain. In particular, by considering different stethoscope
types as individual domains, we propose a novel stethoscope-guided supervised
contrastive learning approach. This method can mitigate any domain-related
disparities and thus enables the model to distinguish respiratory sounds of the
recording variation of the stethoscope. The experimental results on the ICBHI
dataset demonstrate that the proposed methods are effective in reducing the
domain dependency and achieving the ICBHI Score of 61.71%, which is a
significant improvement of 2.16% over the baseline.
( 2
min )
Our study focuses on the potential for modifications of Inception-like
architecture within the electrocardiogram (ECG) domain. To this end, we
introduce IncepSE, a novel network characterized by strategic architectural
incorporation that leverages the strengths of both InceptionTime and channel
attention mechanisms. Furthermore, we propose a training setup that employs
stabilization techniques that are aimed at tackling the formidable challenges
of severe imbalance dataset PTB-XL and gradient corruption. By this means, we
manage to set a new height for deep learning model in a supervised learning
manner across the majority of tasks. Our model consistently surpasses
InceptionTime by substantial margins compared to other state-of-the-arts in
this domain, noticeably 0.013 AUROC score improvement in the "all" task, while
also mitigating the inherent dataset fluctuations during training.
( 2
min )
$B_1^+$ and $B_0$ field-inhomogeneities can significantly reduce accuracy and
robustness of MRF's quantitative parameter estimates. Additional $B_1^+$ and
$B_0$ calibration scans can mitigate this but add scan time and cannot be
applied retrospectively to previously collected data. Here, we proposed a
calibration-free sequence-adaptive deep-learning framework, to estimate and
correct for $B_1^+$ and $B_0$ effects of any MRF sequence. We demonstrate its
capability on arbitrary MRF sequences at 3T, where no training data were
previously obtained. Such approach can be applied to any previously-acquired
and future MRF-scans. The flexibility in directly applying this framework to
other quantitative sequences is also highlighted.
( 2
min )
Uncertainty Quantification (UQ) has gained traction in an attempt to fix the
black-box nature of Deep Learning. Specifically (medical) biosignals such as
electroencephalography (EEG), electrocardiography (ECG), electroocculography
(EOG) and electromyography (EMG) could benefit from good UQ, since these suffer
from a poor signal to noise ratio, and good human interpretability is pivotal
for medical applications and Brain Computer Interfaces. In this paper, we
review the state of the art at the intersection of Uncertainty Quantification
and Biosignal with Machine Learning. We present various methods, shortcomings,
uncertainty measures and theoretical frameworks that currently exist in this
application domain. Overall it can be concluded that promising UQ methods are
available, but that research is needed on how people and systems may interact
with an uncertainty model in a (clinical) environment.
( 2
min )
In this study, we propose an approach for predicting rare events by
exploiting time series in coevolution. Our approach involves a weighted
autologistic regression model, where we leverage the temporal behavior of the
data to enhance predictive capabilities. By addressing the issue of imbalanced
datasets, we establish constraints leading to weight estimation and to improved
performance. Evaluation on synthetic and real-world datasets confirms that our
approach outperform state-of-the-art of predicting home equipment failure
methods.
( 2
min )
This study introduces an innovative 3D printed dry electrode tailored for
biosensing in postoperative recovery scenarios. Fabricated through a drop
coating process, the electrode incorporates a novel 2D material.
( 2
min )
Biased enhanced sampling methods utilizing collective variables (CVs) are
powerful tools for sampling conformational ensembles. Due to high intrinsic
dimensions, efficiently generating conformational ensembles for complex systems
requires enhanced sampling on high-dimensional free energy surfaces. While
methods like temperature-accelerated molecular dynamics (TAMD) can adopt many
CVs in a simulation, unbiasing the simulation requires accurate modeling of a
high-dimensional CV probability distribution, which is challenging for
traditional density estimation techniques. Here we propose an unbiasing method
based on the score-based diffusion model, a deep generative learning method
that excels in density estimation across complex data landscapes. We test the
score-based diffusion unbiasing method on TAMD simulations. The results
demonstrate that this unbiasing approach significantly outperforms traditional
unbiasing methods, and can generate accurate unbiased conformational ensembles
for simulations with a number of CVs higher than usual ranges.
( 2
min )
Catastrophic forgetting(CF) is a significant challenge in continual learning
(CL). In regularization-based approaches to mitigate CF, modifications to
important training parameters are penalized in subsequent tasks using an
appropriate loss function. We propose the RTRA, a modification to the widely
used Elastic Weight Consolidation (EWC) regularization scheme, using the
Natural Gradient for loss function optimization. Our approach improves the
training of regularization-based methods without sacrificing test-data
performance. We compare the proposed RTRA approach against EWC using the
iFood251 dataset. We show that RTRA has a clear edge over the state-of-the-art
approaches.
( 2
min )
Rehearsal-based techniques are commonly used to mitigate catastrophic
forgetting (CF) in Incremental learning (IL). The quality of the exemplars
selected is important for this purpose and most methods do not ensure the
appropriate diversity of the selected exemplars. We propose a new technique
"DSS" -- Diverse Selection of Samples from the input data stream in the
Class-incremental learning (CIL) setup under both disjoint and fuzzy task
boundary scenarios. Our method outperforms state-of-the-art methods and is much
simpler to understand and implement.
( 2
min )
We propose a novel exemplar selection approach based on Principal Component
Analysis (PCA) and median sampling, and a neural network training regime in the
setting of class-incremental learning. This approach avoids the pitfalls due to
outliers in the data and is both simple to implement and use across various
incremental machine learning models. It also has independent usage as a
sampling algorithm. We achieve better performance compared to state-of-the-art
methods.
( 2
min )
The goal of this series is to chronicle opinions and issues in the field of
machine learning as they stand today and as they change over time. The plan is
to host this survey periodically until the AI singularity
paperclip-frenzy-driven doomsday, keeping an updated list of topical questions
and interviewing new community members for each edition. In this issue, we
probed people's opinions on interpretable AI, the value of benchmarking in
modern NLP, the state of progress towards understanding deep learning, and the
future of academia.
( 2
min )
In this survey, we examine algorithms for conducting credit assignment in
artificial neural networks that are inspired or motivated by neurobiology,
unifying these various processes under one possible taxonomy. Our proposed
taxonomy is constructed based on how a learning algorithm answers a central
question underpinning the mechanisms of synaptic plasticity in complex adaptive
neuronal systems: where do the signals that drive the learning in individual
elements of a network come from and how are they produced? In this unified
treatment, we organize the ever-growing set of brain-inspired learning
processes into six general families and consider these in the context of
backpropagation of errors and its known criticisms. The results of this review
are meant to encourage future developments in neuro-mimetic systems and their
constituent learning processes, wherein lies the opportunity to build a strong
bridge between machine learning, computational neuroscience, and cognitive
science.
( 2
min )
In this paper we consider the adversarial contextual bandit problem in metric
spaces. The paper "Nearest neighbour with bandit feedback" tackled this problem
but when there are many contexts near the decision boundary of the comparator
policy it suffers from a high regret. In this paper we eradicate this problem,
designing an algorithm in which we can hold out any set of contexts when
computing our regret term. Our algorithm builds on that of "Nearest neighbour
with bandit feedback" and hence inherits its extreme computational efficiency.
( 2
min )
Theoretical guarantees in reinforcement learning (RL) are known to suffer
multiplicative blow-up factors with respect to the misspecification error of
function approximation. Yet, the nature of such \emph{approximation factors} --
especially their optimal form in a given learning problem -- is poorly
understood. In this paper we study this question in linear off-policy value
function estimation, where many open questions remain. We study the
approximation factor in a broad spectrum of settings, such as with the weighted
$L_2$-norm (where the weighting is the offline state distribution), the
$L_\infty$ norm, the presence vs. absence of state aliasing, and full vs.
partial coverage of the state space. We establish the optimal asymptotic
approximation factors (up to constants) for all of these settings. In
particular, our bounds identify two instance-dependent factors for the
$L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to
dictate the hardness of off-policy evaluation under misspecification.
( 2
min )
Inverse reinforcement learning (IRL) is computationally challenging, with
common approaches requiring the solution of multiple reinforcement learning
(RL) sub-problems. This work motivates the use of potential-based reward
shaping to reduce the computational burden of each RL sub-problem. This work
serves as a proof-of-concept and we hope will inspire future developments
towards computationally efficient IRL.
( 2
min )
In this paper we consider the adversarial contextual bandit problem in metric
spaces. The paper "Nearest neighbour with bandit feedback" tackled this problem
but when there are many contexts near the decision boundary of the comparator
policy it suffers from a high regret. In this paper we eradicate this problem,
designing an algorithm in which we can hold out any set of contexts when
computing our regret term. Our algorithm builds on that of "Nearest neighbour
with bandit feedback" and hence inherits its extreme computational efficiency.
( 2
min )
There have been claims that artificial intelligence is bringing about increased productivity, accuracy, and a smarter workplace. In all of this excitement, it is difficult to differentiate between fact and fantasy. When it comes to the management of workforces, what is the truth there? Within the context of real-world applications, how much hype is there?… Read More »How can data science and AI help HR in workforce development, evaluation, and retention?
The post How can data science and AI help HR in workforce development, evaluation, and retention? appeared first on Data Science Central.
( 29
min )
Artificial intelligence (AI) is one of the most transformational technologies of our generation and provides opportunities to be a force for good and drive economic growth. The growth of large language models (LLMs), with hundreds of billions of parameters, has unlocked new generative AI use cases to improve customer experiences, boost employee productivity, and so […]
( 4
min )
This is a guest post co-written with Babu Srinivasan from MongoDB. As industries evolve in today’s fast-paced business landscape, the inability to have real-time forecasts poses significant challenges for industries heavily reliant on accurate and timely insights. The absence of real-time forecasts in various industries presents pressing business challenges that can significantly impact decision-making and […]
( 8
min )
In this episode of “AI Frontiers,” AI4Science Director Chris Bishop talks about the state of deep learning; his new textbook, “Deep Learning: Foundations and Concepts,” and the impact the field is having on the natural sciences.
The post AI Frontiers: A deep dive into deep learning with Ashley Llorens and Chris Bishop appeared first on Microsoft Research.
( 24
min )
Bilevel optimization has received more and more attention recently due to its
wide applications in machine learning. In this paper, we consider bilevel
optimization in decentralized networks. In particular, we propose a novel
single-loop algorithm for solving decentralized bilevel optimization with
strongly convex lower level problem. Our algorithm is fully single-loop and
does not require heavy matrix-vector multiplications when approximating the
hypergradient. Moreover, unlike existing methods for decentralized bilevel
optimization and federated bilevel optimization, our algorithm does not require
any gradient heterogeneity assumption. Our analysis shows that the proposed
algorithm achieves a sublinear convergence rate. Experimental results on
hyperparameter optimization problem with both synthetic and MNIST data sets
demonstrate the efficiency of the proposed algorithm.
( 2
min )
In part 1 of the series “A Different AI Scenario: AI and Justice in a Brave New World,” I outlined some requirements for the role that AI would play in enforcing our laws and regulations in a more just and fair manner and what our human legislators must do to ensure that outcome. In part… Read More »AI and Justice in a Brave New World: Part 3 – AI Governance
The post AI and Justice in a Brave New World: Part 3 – AI Governance appeared first on Data Science Central.
( 23
min )
In recent years, Transformer-based auto-attention mechanisms have been
successfully applied to the analysis of a variety of context-reliant data
types, from texts to images and beyond, including data from non-Euclidean
geometries. In this paper, we present such a mechanism, designed to classify
sequences of Symmetric Positive Definite matrices while preserving their
Riemannian geometry throughout the analysis. We apply our method to automatic
sleep staging on timeseries of EEG-derived covariance matrices from a standard
dataset, obtaining high levels of stage-wise performance.
( 2
min )
This paper introduces a physics-informed machine learning approach for
pathloss prediction. This is achieved by including in the training phase
simultaneously (i) physical dependencies between spatial loss field and (ii)
measured pathloss values in the field. It is shown that the solution to a
proposed learning problem improves generalization and prediction quality with a
small number of neural network layers and parameters. The latter leads to fast
inference times which are favorable for downstream tasks such as localization.
Moreover, the physics-informed formulation allows training and prediction with
a small amount of training data which makes it appealing for a wide range of
practical pathloss prediction scenarios.
( 2
min )
Real-time monitoring of human behaviours, especially in e-Health
applications, has been an active area of research in the past decades. On top
of IoT-based sensing environments, anomaly detection algorithms have been
proposed for the early detection of abnormalities. Gradual change procedures,
commonly referred to as drift anomalies, have received much less attention in
the literature because they represent a much more challenging scenario than
sudden temporary changes (point anomalies). In this paper, we propose, for the
first time, a fully unsupervised real-time drift detection algorithm named
DynAmo, which can identify drift periods as they are happening. DynAmo
comprises a dynamic clustering component to capture the overall trends of
monitored behaviours and a trajectory generation component, which extracts
features from the densest cluster centroids. Finally, we apply an ensemble of
divergence tests on sliding reference and detection windows to detect drift
periods in the behavioural sequence.
( 2
min )
We propose a new method called the Metropolis-adjusted Mirror Langevin
algorithm for approximate sampling from distributions whose support is a
compact and convex set. This algorithm adds an accept-reject filter to the
Markov chain induced by a single step of the mirror Langevin algorithm (Zhang
et al., 2020), which is a basic discretisation of the mirror Langevin dynamics.
Due to the inclusion of this filter, our method is unbiased relative to the
target, while known discretisations of the mirror Langevin dynamics including
the mirror Langevin algorithm have an asymptotic bias. We give upper bounds for
the mixing time of the proposed algorithm when the potential is relatively
smooth, convex, and Lipschitz with respect to a self-concordant mirror
function. As a consequence of the reversibility of the Markov chain induced by
the algorithm, we obtain an exponentially better dependence on the error
tolerance for approximate sampling. We also present numerical experiments that
corroborate our theoretical findings.
( 2
min )
(1) The enhanced capability of Graph Neural Networks (GNNs) in unsupervised
community detection of clustered nodes is attributed to their capacity to
encode both the connectivity and feature information spaces of graphs. The
identification of latent communities holds practical significance in various
domains, from social networks to genomics. Current real-world performance
benchmarks are perplexing due to the multitude of decisions influencing GNN
evaluations for this task. (2) Three metrics are compared to assess the
consistency of algorithm rankings in the presence of randomness. The
consistency and quality of performance between the results under a
hyperparameter optimisation with the default hyperparameters is evaluated. (3)
The results compare hyperparameter optimisation with default hyperparameters,
revealing a significant performance loss when neglecting hyperparameter
investigation. A comparison of metrics indicates that ties in ranks can
substantially alter the quantification of randomness. (4) Ensuring adherence to
the same evaluation criteria may result in notable differences in the reported
performance of methods for this task. The $W$ Randomness coefficient, based on
the Wasserstein distance, is identified as providing the most robust assessment
of randomness.
( 3
min )
We study vehicle dispatching in autonomous mobility on demand (AMoD) systems,
where a central operator assigns vehicles to customer requests or rejects these
with the aim of maximizing its total profit. Recent approaches use multi-agent
deep reinforcement learning (MADRL) to realize scalable yet performant
algorithms, but train agents based on local rewards, which distorts the reward
signal with respect to the system-wide profit, leading to lower performance. We
therefore propose a novel global-rewards-based MADRL algorithm for vehicle
dispatching in AMoD systems, which resolves so far existing goal conflicts
between the trained agents and the operator by assigning rewards to agents
leveraging a counterfactual baseline. Our algorithm shows statistically
significant improvements across various settings on real-world data compared to
state-of-the-art MADRL algorithms with local rewards. We further provide a
structural analysis which shows that the utilization of global rewards can
improve implicit vehicle balancing and demand forecasting abilities. Our code
is available at https://github.com/tumBAIS/GR-MADRL-AMoD.
( 2
min )
We propose a framework that leverages foundation models as teachers, guiding
a reinforcement learning agent to acquire semantically meaningful behavior
without human feedback. In our framework, the agent receives task instructions
grounded in a training environment from large language models. Then, a
vision-language model guides the agent in learning the multi-task
language-conditioned policy by providing reward feedback. We demonstrate that
our method can learn semantically meaningful skills in a challenging open-ended
MineDojo environment while prior unsupervised skill discovery methods struggle.
Additionally, we discuss observed challenges of using off-the-shelf foundation
models as teachers and our efforts to address them.
( 2
min )
We present several methods for predicting the dynamics of Hamiltonian systems
from discrete observations of their vector field. Each method is either
informed or uninformed of the Hamiltonian property. We empirically and
comparatively evaluate the methods and observe that information that the system
is Hamiltonian can be effectively informed, and that different methods strike
different trade-offs between efficiency and effectiveness for different
dynamical systems.
( 2
min )
In real-world scenarios classification models are often required to perform
robustly when predicting samples belonging to classes that have not appeared
during its training stage. Open Set Recognition addresses this issue by
devising models capable of detecting unknown classes from samples arriving
during the testing phase, while maintaining a good level of performance in the
classification of samples belonging to known classes. This review
comprehensively overviews the recent literature related to Open Set
Recognition, identifying common practices, limitations, and connections of this
field with other machine learning research areas, such as continual learning,
out-of-distribution detection, novelty detection, and uncertainty estimation.
Our work also uncovers open problems and suggests several research directions
that may motivate and articulate future efforts towards more safe Artificial
Intelligence methods.
( 2
min )
Humanoid robots will be able to assist humans in their daily life, in
particular due to their versatile action capabilities. However, while these
robots need a certain degree of autonomy to learn and explore, they also should
respect various constraints, for access control and beyond. We explore the
novel field of incorporating privacy, security, and access control constraints
with robot task planning approaches. We report preliminary results on the
classical symbolic approach, deep-learned neural networks, and modern ideas
using large language models as knowledge base. From analyzing their trade-offs,
we conclude that a hybrid approach is necessary, and thereby present a new use
case for the emerging field of neuro-symbolic artificial intelligence.
( 2
min )
In continual learning, networks confront a trade-off between stability and
plasticity when trained on a sequence of tasks. To bolster plasticity without
sacrificing stability, we propose a novel training algorithm called LRFR. This
approach optimizes network parameters in the null space of the past tasks'
feature representation matrix to guarantee the stability. Concurrently, we
judiciously select only a subset of neurons in each layer of the network while
training individual tasks to learn the past tasks' feature representation
matrix in low-rank. This increases the null space dimension when designing
network parameters for subsequent tasks, thereby enhancing the plasticity.
Using CIFAR-100 and TinyImageNet as benchmark datasets for continual learning,
the proposed approach consistently outperforms state-of-the-art methods.
( 2
min )
We propose HAROOD as a short-range FMCW radar-based human activity classifier
and out-of-distribution (OOD) detector. It aims to classify human sitting,
standing, and walking activities and to detect any other moving or stationary
object as OOD. We introduce a two-stage network. The first stage is trained
with a novel loss function that includes intermediate reconstruction loss,
intermediate contrastive loss, and triplet loss. The second stage uses the
first stage's output as its input and is trained with cross-entropy loss. It
creates a simple classifier that performs the activity classification. On our
dataset collected by 60 GHz short-range FMCW radar, we achieve an average
classification accuracy of 96.51%. Also, we achieve an average AUROC of 95.04%
as an OOD detector. Additionally, our extensive evaluations demonstrate the
superiority of HAROOD over the state-of-the-art OOD detection methods in terms
of standard OOD detection metrics.
( 2
min )
We address the Continual Learning (CL) problem, where a model has to learn a
sequence of tasks from non-stationary distributions while preserving prior
knowledge as it encounters new experiences. With the advancement of foundation
models, CL research has shifted focus from the initial learning-from-scratch
paradigm to the use of generic features from large-scale pre-training. However,
existing approaches to CL with pre-trained models only focus on separating the
class-specific features from the final representation layer and neglect the
power of intermediate representations that capture low- and mid-level features
naturally more invariant to domain shifts. In this work, we propose LayUP, a
new class-prototype-based approach to continual learning that leverages
second-order feature statistics from multiple intermediate layers of a
pre-trained network. Our method is conceptually simple, does not require any
replay buffer, and works out of the box with any foundation model. LayUP
improves over the state-of-the-art on four of the seven class-incremental
learning settings at a considerably reduced memory and computational footprint
compared with the next best baseline. Our results demonstrate that fully
exhausting the representational capacities of pre-trained models in CL goes far
beyond their final embeddings.
( 2
min )
Deep Reinforcement Learning (DRL) has achieved remarkable advances in
sequential decision tasks. However, recent works have revealed that DRL agents
are susceptible to slight perturbations in observations. This vulnerability
raises concerns regarding the effectiveness and robustness of deploying such
agents in real-world applications. In this work, we propose a novel robust
reinforcement learning method called SortRL, which improves the robustness of
DRL policies against observation perturbations from the perspective of the
network architecture. We employ a novel architecture for the policy network
that incorporates global $l_\infty$ Lipschitz continuity and provide a
convenient method to enhance policy robustness based on the output margin.
Besides, a training framework is designed for SortRL, which solves given tasks
while maintaining robustness against $l_\infty$ bounded perturbations on the
observations. Several experiments are conducted to evaluate the effectiveness
of our method, including classic control tasks and video games. The results
demonstrate that SortRL achieves state-of-the-art robustness performance
against different perturbation strength.
( 2
min )
Many neural network architectures have been shown to be Turing Complete, and
can thus implement arbitrary algorithms. However, Transformers are unique in
that they can implement gradient-based learning algorithms \emph{under simple
parameter configurations}. A line of recent work shows that linear Transformers
naturally learn to implement gradient descent (GD) when trained on a linear
regression in-context learning task. But the linearity assumption (either in
the Transformer architecture or in the learning task) is far from realistic
settings where non-linear activations crucially enable Transformers to learn
complicated non-linear functions. In this paper, we provide theoretical and
empirical evidence that non-linear Transformers can, and \emph{in fact do},
learn to implement learning algorithms to learn non-linear functions in
context. Our results apply to a broad class of combinations of non-linear
architectures, and non-linear in-context learning tasks. Interestingly, we show
that the optimal choice of non-linear activation depends in a natural way on
the non-linearity of the learning task.
( 2
min )
Melanoma is a type of cancer that begins in the cells controlling the pigment
of the skin, and it is often referred to as the most dangerous skin cancer.
Diagnosing melanoma can be time-consuming, and a recent increase in melanoma
incidents indicates a growing demand for a more efficient diagnostic process.
This paper presents a pipeline for melanoma diagnostics, leveraging two
convolutional neural networks, a diagnosis, and a prognosis model. The
diagnostic model is responsible for localizing malignant patches across whole
slide images and delivering a patient-level diagnosis as malignant or benign.
Further, the prognosis model utilizes the diagnostic model's output to provide
a patient-level prognosis as good or bad. The full pipeline has an F1 score of
0.79 when tested on data from the same distribution as it was trained on.
( 2
min )
Polyp segmentation, a contentious issue in medical imaging, has seen numerous
proposed methods aimed at improving the quality of segmented masks. Currently,
state-of-the-art techniques yield impressive results. However, the sheer size
of these models poses challenges for practical industry applications. To
address this, we present a Knowledge Distillation framework, incorporating
attention supervision and the symmetrical guiding method. This framework is
designed to facilitate knowledge transfer from a teacher model to a more
compact student model with fewer parameters. Our experimental evaluation of the
framework assesses its effectiveness in enabling the student model to acquire
knowledge from the teacher efficiently. Additionally, our method serves to
prevent the student model from incorporating redundant features that could lead
to inaccurate predictions. Consequently, our method, boasting approximately 5
million parameters, achieves competitive results comparable to the
state-of-the-art approaches. The implementation can be found at:
https://github.com/huyquoctrinh/KDAS3
( 2
min )
In this work, we formally prove that, under certain conditions, if a neural
network is invariant to a finite group then its weights recover the Fourier
transform on that group. This provides a mathematical explanation for the
emergence of Fourier features -- a ubiquitous phenomenon in both biological and
artificial learning systems. The results hold even for non-commutative groups,
in which case the Fourier transform encodes all the irreducible unitary group
representations. Our findings have consequences for the problem of symmetry
discovery. Specifically, we demonstrate that the algebraic structure of an
unknown group can be recovered from the weights of a network that is at least
approximately invariant within certain bounds. Overall, this work contributes
to a foundation for an algebraic learning theory of invariant neural network
representations.
( 2
min )
This article presents a new methodology for extracting intervals when a home
is vacant from low-frequency electricity consumption data. The approach
combines multiple algorithms, including change point detection, classification,
period detection, and periodic spikes retrieval. It shows encouraging results
on both simulated and real consumption curves. This approach offers practical
insights for optimizing energy use and holds potential benefits for residential
consumers and utility companies in terms of energy cost reduction and
sustainability. Further research is needed to enhance its applicability in
diverse settings and with larger datasets.
( 2
min )
In various scientific and engineering applications, there is typically an
approximate model of the underlying complex system, even though it contains
both aleatoric and epistemic uncertainties. In this paper, we present a
principled method to incorporate these approximate models as physics priors in
modeling, to prevent overfitting and enhancing the generalization capabilities
of the trained models. Utilizing the structural risk minimization (SRM)
inductive principle pioneered by Vapnik, this approach structures the physics
priors into generalized regularizers. The experimental results demonstrate that
our method achieves up to two orders of magnitude of improvement in testing
accuracy.
( 2
min )
We present a novel deep learning method for estimating time-dependent
parameters in Markov processes through discrete sampling. Departing from
conventional machine learning, our approach reframes parameter approximation as
an optimization problem using the maximum likelihood approach. Experimental
validation focuses on parameter estimation in multivariate regression and
stochastic differential equations (SDEs). Theoretical results show that the
real solution is close to SDE with parameters approximated using our neural
network-derived under specific conditions. Our work contributes to SDE-based
model parameter estimation, offering a versatile tool for diverse fields.
( 2
min )
We introduced a new framework to detect perceptual bugs using a Long
Short-Term Memory (LSTM) network, which detects bugs in video games as
anomalies. The detected buggy frames are then clustered to determine the
category of the occurred bug. The framework was evaluated on two First Person
Shooter (FPS) games. Results show the effectiveness of the framework.
( 2
min )
Cardiovascular diseases, particularly heart failure, are a leading cause of
death globally. The early detection of heart failure through routine
echocardiogram screenings is often impeded by the high cost and labor-intensive
nature of these procedures, a barrier that can mean the difference between life
and death. This paper presents ConFormer, a novel deep learning model designed
to automate the estimation of Ejection Fraction (EF) and Left Ventricular Wall
Thickness from echocardiograms. The implementation of ConFormer has the
potential to enhance preventative cardiology by enabling cost-effective,
accessible, and comprehensive heart health monitoring, thereby saving countless
lives. The source code is available at https://github.com/Aether111/ConFormer.
( 2
min )
Hypernetworks are meta neural networks that generate weights for a main
neural network in an end-to-end differentiable manner. Despite extensive
applications ranging from multi-task learning to Bayesian deep learning, the
problem of optimizing hypernetworks has not been studied to date. We observe
that classical weight initialization methods like Glorot & Bengio (2010) and He
et al. (2015), when applied directly on a hypernet, fail to produce weights for
the mainnet in the correct scale. We develop principled techniques for weight
initialization in hypernets, and show that they lead to more stable mainnet
weights, lower training loss, and faster convergence.
( 2
min )
In this paper, we propose a novel personalized decision support system that
combines Theory of Mind (ToM) modeling and explainable Reinforcement Learning
(XRL) to provide effective and interpretable interventions. Our method
leverages DRL to provide expert action recommendations while incorporating ToM
modeling to understand users' mental states and predict their future actions,
enabling appropriate timing for intervention. To explain interventions, we use
counterfactual explanations based on RL's feature importance and users' ToM
model structure. Our proposed system generates accurate and personalized
interventions that are easily interpretable by end-users. We demonstrate the
effectiveness of our approach through a series of crowd-sourcing experiments in
a simulated team decision-making task, where our system outperforms control
baselines in terms of task performance. Our proposed approach is agnostic to
task environment and RL model structure, therefore has the potential to be
generalized to a wide range of applications.
( 2
min )
In many applications, such as scientific literature management, researcher
search, social network analysis and etc, Name Disambiguation (aiming at
disambiguating WhoIsWho) has been a challenging problem. In addition, the
growth of scientific literature makes the problem more difficult and urgent.
Although name disambiguation has been extensively studied in academia and
industry, the problem has not been solved well due to the clutter of data and
the complexity of the same name scenario. In this work, we aim to explore
models that can perform the task of name disambiguation using the network
structure that is intrinsic to the problem and present an analysis of the
models.
( 2
min )
The high dimensionality and complexity of neuroimaging data necessitate large
datasets to develop robust and high-performing deep learning models. However,
the neuroimaging field is notably hampered by the scarcity of such datasets. In
this work, we proposed a data augmentation and validation framework that
utilizes dynamic forecasting with Long Short-Term Memory (LSTM) networks to
enrich datasets. We extended multivariate time series data by predicting the
time courses of independent component networks (ICNs) in both one-step and
recursive configurations. The effectiveness of these augmented datasets was
then compared with the original data using various deep learning models
designed for chronological age prediction tasks. The results suggest that our
approach improves model performance, providing a robust solution to overcome
the challenges presented by the limited size of neuroimaging datasets.
( 2
min )
Motivated by policy gradient methods in the context of reinforcement
learning, we derive the first large deviation rate function for the iterates
generated by stochastic gradient descent for possibly non-convex objectives
satisfying a Polyak-Lojasiewicz condition. Leveraging the contraction principle
from large deviations theory, we illustrate the potential of this result by
showing how convergence properties of policy gradient with a softmax
parametrization and an entropy regularized objective can be naturally extended
to a wide spectrum of other policy parametrizations.
( 2
min )
We study Off-Policy Evaluation (OPE) in contextual bandit settings with large
action spaces. The benchmark estimators suffer from severe bias and variance
tradeoffs. Parametric approaches suffer from bias due to difficulty specifying
the correct model, whereas ones with importance weight suffer from variance. To
overcome these limitations, Marginalized Inverse Propensity Scoring (MIPS) was
proposed to mitigate the estimator's variance via embeddings of an action.
Nevertheless, MIPS is unbiased under the no direct effect, which assumes that
the action embedding completely mediates the effect of an action on a reward.
To overcome the dependency on these unrealistic assumptions, we propose a
Marginalized Doubly Robust (MDR) estimator. Theoretical analysis shows that the
proposed estimator is unbiased under weaker assumptions than MIPS while
reducing the variance against MIPS. The empirical experiment verifies the
supremacy of MDR against existing estimators with large action spaces.
( 2
min )
This paper introduces a physics-informed machine learning approach for
pathloss prediction. This is achieved by including in the training phase
simultaneously (i) physical dependencies between spatial loss field and (ii)
measured pathloss values in the field. It is shown that the solution to a
proposed learning problem improves generalization and prediction quality with a
small number of neural network layers and parameters. The latter leads to fast
inference times which are favorable for downstream tasks such as localization.
Moreover, the physics-informed formulation allows training and prediction with
a small amount of training data which makes it appealing for a wide range of
practical pathloss prediction scenarios.
( 2
min )
A default assumption in reinforcement learning (RL) and optimal control is
that observations arrive at discrete time points on a fixed clock cycle. Yet,
many applications involve continuous-time systems where the time
discretization, in principle, can be managed. The impact of time discretization
on RL methods has not been fully characterized in existing theory, but a more
detailed analysis of its effect could reveal opportunities for improving
data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation
for LQR systems and uncover a fundamental trade-off between approximation and
statistical error in value estimation. Importantly, these two errors behave
differently to time discretization, leading to an optimal choice of temporal
resolution for a given data budget. These findings show that managing the
temporal resolution can provably improve policy evaluation efficiency in LQR
systems with finite data. Empirically, we demonstrate the trade-off in
numerical simulations of LQR instances and standard RL benchmarks for
non-linear continuous control.
( 2
min )
Gaussian process regression is a classical kernel method for function
estimation and data interpolation. In large data applications, computational
costs can be reduced using low-rank or sparse approximations of the kernel.
This paper investigates the effect of such kernel approximations on the
interpolation error. We introduce a unified framework to analyze Gaussian
process regression under important classes of computational misspecification:
Karhunen-Lo\`eve expansions that result in low-rank kernel approximations,
multiscale wavelet expansions that induce sparsity in the covariance matrix,
and finite element representations that induce sparsity in the precision
matrix. Our theory also accounts for epistemic misspecification in the choice
of kernel parameters.
( 2
min )
This paper considers the problem of evaluating an autonomous system's
competency in performing a task, particularly when working in dynamic and
uncertain environments. The inherent opacity of machine learning models, from
the perspective of the user, often described as a `black box', poses a
challenge. To overcome this, we propose using a measure called the Surprise
index, which leverages available measurement data to quantify whether the
dynamic system performs as expected. We show that the surprise index can be
computed in closed form for dynamic systems when observed evidence in a
probabilistic model if the joint distribution for that evidence follows a
multivariate Gaussian marginal distribution. We then apply it to a nonlinear
spacecraft maneuver problem, where actions are chosen by a reinforcement
learning agent and show it can indicate how well the trajectory follows the
required orbit.
( 2
min )
Predictive Process Monitoring (PPM) aims at leveraging historic process
execution data to predict how ongoing executions will continue up to their
completion. In recent years, PPM techniques for the prediction of the next
activities have matured significantly, mainly thanks to the use of Neural
Networks (NNs) as a predictor. While their performance is difficult to beat in
the general case, there are specific situations where background process
knowledge can be helpful. Such knowledge can be leveraged for improving the
quality of predictions for exceptional process executions or when the process
changes due to a concept drift. In this paper, we present a Symbolic[Neuro]
system that leverages background knowledge expressed in terms of a procedural
process model to offset the under-sampling in the training data. More
specifically, we make predictions using NNs with attention mechanism, an
emerging technology in the NN field. The system has been tested on several
real-life logs showing an improvement in the performance of the prediction
task.
( 2
min )
A large amount of effort has recently been put into understanding the barren
plateau phenomenon. In this perspective article, we face the increasingly loud
elephant in the room and ask a question that has been hinted at by many but not
explicitly addressed: Can the structure that allows one to avoid barren
plateaus also be leveraged to efficiently simulate the loss classically? We
present strong evidence that commonly used models with provable absence of
barren plateaus are also classically simulable, provided that one can collect
some classical data from quantum devices during an initial data acquisition
phase. This follows from the observation that barren plateaus result from a
curse of dimensionality, and that current approaches for solving them end up
encoding the problem into some small, classically simulable, subspaces. This
sheds serious doubt on the non-classicality of the information processing
capabilities of parametrized quantum circuits for barren plateau-free
landscapes and on the possibility of superpolynomial advantages from running
them on quantum hardware. We end by discussing caveats in our arguments, the
role of smart initializations, and by highlighting new opportunities that our
perspective raises.
( 3
min )
We propose a new method called the Metropolis-adjusted Mirror Langevin
algorithm for approximate sampling from distributions whose support is a
compact and convex set. This algorithm adds an accept-reject filter to the
Markov chain induced by a single step of the mirror Langevin algorithm (Zhang
et al., 2020), which is a basic discretisation of the mirror Langevin dynamics.
Due to the inclusion of this filter, our method is unbiased relative to the
target, while known discretisations of the mirror Langevin dynamics including
the mirror Langevin algorithm have an asymptotic bias. We give upper bounds for
the mixing time of the proposed algorithm when the potential is relatively
smooth, convex, and Lipschitz with respect to a self-concordant mirror
function. As a consequence of the reversibility of the Markov chain induced by
the algorithm, we obtain an exponentially better dependence on the error
tolerance for approximate sampling. We also present numerical experiments that
corroborate our theoretical findings.
( 2
min )
We present a novel deep learning method for estimating time-dependent
parameters in Markov processes through discrete sampling. Departing from
conventional machine learning, our approach reframes parameter approximation as
an optimization problem using the maximum likelihood approach. Experimental
validation focuses on parameter estimation in multivariate regression and
stochastic differential equations (SDEs). Theoretical results show that the
real solution is close to SDE with parameters approximated using our neural
network-derived under specific conditions. Our work contributes to SDE-based
model parameter estimation, offering a versatile tool for diverse fields.
( 2
min )
We are excited to announce the launch of Amazon DocumentDB (with MongoDB compatibility) integration with Amazon SageMaker Canvas, allowing Amazon DocumentDB customers to build and use generative AI and machine learning (ML) solutions without writing code. Amazon DocumentDB is a fully managed native JSON document database that makes it straightforward and cost-effective to operate critical […]
( 9
min )
“Minimum viewing time” benchmark gauges image recognition complexity for AI systems by measuring the time needed for accurate human identification.
( 11
min )
Using generative AI, MIT chemists created a model that can predict the structures formed when a chemical reaction reaches its point of no return.
( 9
min )
Generative Artificial Intelligence (AI) is one of the most exciting
developments in Computer Science of the last decade. At the same time,
Reinforcement Learning (RL) has emerged as a very successful paradigm for a
variety of machine learning tasks. In this survey, we discuss the state of the
art, opportunities and open research questions in applying RL to generative AI.
In particular, we will discuss three types of applications, namely, RL as an
alternative way for generation without specified objectives; as a way for
generating outputs while concurrently maximizing an objective function; and,
finally, as a way of embedding desired characteristics, which cannot be easily
captured by means of an objective function, into the generative process. We
conclude the survey with an in-depth discussion of the opportunities and
challenges in this fascinating emerging area.
( 2
min )
In distributed training, communication often emerges as a bottleneck. In
response, we introduce Kimad, a solution that offers adaptive gradient
compression. By consistently monitoring bandwidth, Kimad refines compression
ratios to match specific neural network layer requirements. Our exhaustive
tests and proofs confirm Kimad's outstanding performance, establishing it as a
benchmark in adaptive compression for distributed deep learning.
( 2
min )
Quantum neural networks (QNNs) and quantum kernels stand as prominent figures
in the realm of quantum machine learning, poised to leverage the nascent
capabilities of near-term quantum computers to surmount classical machine
learning challenges. Nonetheless, the training efficiency challenge poses a
limitation on both QNNs and quantum kernels, curbing their efficacy when
applied to extensive datasets. To confront this concern, we present a unified
approach: coreset selection, aimed at expediting the training of QNNs and
quantum kernels by distilling a judicious subset from the original training
dataset. Furthermore, we analyze the generalization error bounds of QNNs and
quantum kernels when trained on such coresets, unveiling the comparable
performance with those training on the complete original dataset. Through
systematic numerical simulations, we illuminate the potential of coreset
selection in expediting tasks encompassing synthetic data classification,
identification of quantum correlations, and quantum compiling. Our work offers
a useful way to improve diverse quantum machine learning models with a
theoretical guarantee while reducing the training cost.
( 2
min )
We present a new method for functional tissue unit segmentation at the
cellular level, which utilizes the latest deep learning semantic segmentation
approaches together with domain adaptation and semi-supervised learning
techniques. This approach allows for minimizing the domain gap, class
imbalance, and captures settings influence between HPA and HubMAP datasets. The
presented approach achieves comparable with state-of-the-art-result in
functional tissue unit segmentation at the cellular level. The source code is
available at https://github.com/VSydorskyy/hubmap_2022_htt_solution
( 2
min )
We consider decentralized learning for zero-sum games, where players only see
their payoff information and are agnostic to actions and payoffs of the
opponent. Previous works demonstrated convergence to a Nash equilibrium in this
setting using double time-scale algorithms under strong reachability
assumptions. We address the open problem of achieving an approximate Nash
equilibrium efficiently with an uncoupled and single time-scale algorithm under
weaker conditions. Our contribution is a rational and convergent algorithm,
utilizing Tsallis-entropy regularization in a value-iteration-based approach.
The algorithm learns an approximate Nash equilibrium in polynomial time,
requiring only the existence of a policy pair that induces an irreducible and
aperiodic Markov chain, thus considerably weakening past assumptions. Our
analysis leverages negative drift inequalities and introduces novel properties
of Tsallis entropy that are of independent interest.
( 2
min )
This paper extends our previous method for COVID-19 diagnosis, proposing an
enhanced solution for detecting COVID-19 from computed tomography (CT) images.
To decrease model misclassifications, two key steps of image processing were
employed. Firstly, the uppermost and lowermost slices were removed, preserving
sixty percent of each patient's slices. Secondly, all slices underwent manual
cropping to emphasize the lung areas. Subsequently, resized CT scans (224 by
224) were input into an Xception transfer learning model. Leveraging Xception's
architecture and pre-trained weights, the modified model achieved binary
classification. Promising results on the COV19-CT database showcased higher
validation accuracy and macro F1 score at both the slice and patient levels
compared to our previous solution and alternatives on the same dataset.
( 2
min )
Cadastres from the 19th century are a complex as well as rich source for
historians and archaeologists, whose use presents them with great challenges.
For archaeological and historical remote sensing, we have trained several Deep
Learning models, CNNs as well as Vision Transformers, to extract large-scale
data from this knowledge representation. We present the principle results of
our work here and we present a the demonstrator of our browser-based tool that
allows researchers and public stakeholders to quickly identify spots that
featured buildings in the 19th century Franciscean Cadastre. The tool not only
supports scholars and fellow researchers in building a better understanding of
the settlement history of the region of Styria, it also helps public
administration and fellow citizens to swiftly identify areas of heightened
sensibility with regard to the cultural heritage of the region.
( 2
min )
Popular guidance for denoising diffusion probabilistic model (DDPM) linearly
combines distinct conditional models together to provide enhanced control over
samples. However, this approach overlooks nonlinear effects that become
significant when guidance scale is large. To address this issue, we propose
characteristic guidance, a novel method that provides non-linear correction for
classifier-free guided DDPMs. Such correction forces the guided DDPMs to
respect the Fokker-Planck equation of their underlying diffusion process, in a
way that is first-principle, training-free, derivative-free, and compatible
with existing sampling methods. Experiments show that characteristic guidance
is robust to various applications, offers enhanced control over sample
generation, suppresses color and exposure issues even for latent space
sampling, and can handle physics problems such as the phase transitions.
( 2
min )
Likelihood-free inference is quickly emerging as a powerful tool to perform
fast/effective parameter estimation. We demonstrate a technique of optimizing
likelihood-free inference to make it even faster by marginalizing symmetries in
a physical problem. In this approach, physical symmetries, for example,
time-translation are learned using joint-embedding via self-supervised learning
with symmetry data augmentations. Subsequently, parameter inference is
performed using a normalizing flow where the embedding network is used to
summarize the data before conditioning the parameters. We present this approach
on two simple physical problems and we show faster convergence in a smaller
number of parameters compared to a normalizing flow that does not use a
pre-trained symmetry-informed representation.
( 2
min )
The utilization of deep learning-based object detection is an effective
approach to assist visually impaired individuals in avoiding obstacles. In this
paper, we implemented seven different YOLO object detection models
\textit{viz}., YOLO-NAS (small, medium, large), YOLOv8, YOLOv7, YOLOv6, and
YOLOv5 and performed comprehensive evaluation with carefully tuned
hyperparameters, to analyze how these models performed on images containing
common daily-life objects presented on roads and sidewalks. After a systematic
investigation, YOLOv8 was found to be the best model, which reached a precision
of $80\%$ and a recall of $68.2\%$ on a well-known Obstacle Dataset which
includes images from VOC dataset, COCO dataset, and TT100K dataset along with
images collected by the researchers in the field. Despite being the latest
model and demonstrating better performance in many other applications, YOLO-NAS
was found to be suboptimal for the obstacle detection task.
( 2
min )
Sleep detection and annotation are crucial for researchers to understand
sleep patterns, especially in children. With modern wrist-worn watches
comprising built-in accelerometers, sleep logs can be collected. However, the
annotation of these logs into distinct sleep events: onset and wakeup, proves
to be challenging. These annotations must be automated, precise, and scalable.
We propose to model the accelerometer data using different machine learning
(ML) techniques such as support vectors, boosting, ensemble methods, and more
complex approaches involving LSTMs and Region-based CNNs. Later, we aim to
evaluate these approaches using the Event Detection Average Precision (EDAP)
score (similar to the IOU metric) to eventually compare the predictive power
and model performance.
( 2
min )
Safeguarding privacy in sensitive training data is paramount, particularly in
the context of generative modeling. This is done through either differentially
private stochastic gradient descent, or with a differentially private metric
for training models or generators. In this paper, we introduce a novel
differentially private generative modeling approach based on parameter-free
gradient flows in the space of probability measures. The proposed algorithm is
a new discretized flow which operates through a particle scheme, utilizing
drift derived from the sliced Wasserstein distance and computed in a private
manner. Our experiments show that compared to a generator-based model, our
proposed model can generate higher-fidelity data at a low privacy budget,
offering a viable alternative to generator-based approaches.
( 2
min )
Influenced mixed moving average fields are a versatile modeling class for
spatio-temporal data. However, their predictive distribution is not generally
known. Under this modeling assumption, we define a novel spatio-temporal
embedding and a theory-guided machine learning approach that employs a
generalized Bayesian algorithm to make ensemble forecasts. We employ Lipschitz
predictors and determine fixed-time and any-time PAC Bayesian bounds in the
batch learning setting. Performing causal forecast is a highlight of our
methodology as its potential application to data with spatial and temporal
short and long-range dependence. We then test the performance of our learning
methodology by using linear predictors and data sets simulated from a
spatio-temporal Ornstein-Uhlenbeck process.
( 2
min )
The randomly pivoted partial Cholesky algorithm (RPCholesky) computes a
factorized rank-k approximation of an N x N positive-semidefinite (psd) matrix.
RPCholesky requires only (k + 1) N entry evaluations and O(k^2 N) additional
arithmetic operations, and it can be implemented with just a few lines of code.
The method is particularly useful for approximating a kernel matrix.
This paper offers a thorough new investigation of the empirical and
theoretical behavior of this fundamental algorithm. For matrix approximation
problems that arise in scientific machine learning, experiments show that
RPCholesky matches or beats the performance of alternative algorithms.
Moreover, RPCholesky provably returns low-rank approximations that are nearly
optimal. The simplicity, effectiveness, and robustness of RPCholesky strongly
support its use in scientific computing and machine learning applications.
( 2
min )
With the rise of voice search, how can businesses adapt their SEO strategies to optimize for conversational queries, backed by data-driven insights? Voice search is causing changes to occur in search engine optimization. Users are using more natural language and conversational queries with voice-activated devices. Businesses need to adjust SEO strategies for changing search behavior.… Read More »Voice Search Revolution: Data-Driven SEO Strategies for Future Success
The post Voice Search Revolution: Data-Driven SEO Strategies for Future Success appeared first on Data Science Central.
( 26
min )
The Energy and Climate Hack presented opportunities for students and companies to collaborate and develop innovative solutions.
( 8
min )
Amazon SageMaker Studio offers a broad set of fully managed integrated development environments (IDEs) for machine learning (ML) development, including JupyterLab, Code Editor based on Code-OSS (Visual Studio Code Open Source), and RStudio. It provides access to the most comprehensive set of tools for each step of ML development, from preparing data to building, training, […]
( 16
min )
This is a customer post jointly authored by ICL and AWS employees. ICL is a multi-national manufacturing and mining corporation based in Israel that manufactures products based on unique minerals and fulfills humanity’s essential needs, primarily in three markets: agriculture, food, and engineered materials. Their mining sites use industrial equipment that has to be monitored […]
( 8
min )
Amazon Comprehend is a natural-language processing (NLP) service that provides pre-trained and custom APIs to derive insights from textual data. Amazon Comprehend customers can train custom named entity recognition (NER) models to extract entities of interest, such as location, person name, and date, that are unique to their business. To train a custom model, you […]
( 8
min )
Text-to-image generation is a rapidly growing field of artificial intelligence with applications in a variety of areas, such as media and entertainment, gaming, ecommerce product visualization, advertising and marketing, architectural design and visualization, artistic creations, and medical imaging. Stable Diffusion is a text-to-image model that empowers you to create high-quality images within seconds. In November […]
( 9
min )
This post outlines the ETL pipeline we developed for feature processing for training and deploying a job recommender model at Talent.com. Our pipeline uses SageMaker Processing jobs for efficient data processing and feature extraction at a large scale. Feature extraction code is implemented in Python enabling the use of popular ML libraries to perform feature extraction at scale, without the need to port the code to use PySpark.
( 10
min )
This GFN Thursday is burning rubber with the latest Forza Horizon games from Microsoft Studios. Check them out on PC Game Pass. Plus, give the gift of cloud gaming with the latest membership bundle, which includes a free, three-month PC Game Pass subscription with the purchase of a six-month GeForce NOW Ultimate membership. It’s all Read article >
( 6
min )
No content preview
( 1
min )
We present Cross-Client Label Propagation(XCLP), a new method for
transductive federated learning. XCLP estimates a data graph jointly from the
data of multiple clients and computes labels for the unlabeled data by
propagating label information across the graph. To avoid clients having to
share their data with anyone, XCLP employs two cryptographically secure
protocols: secure Hamming distance computation and secure summation. We
demonstrate two distinct applications of XCLP within federated learning. In the
first, we use it in a one-shot way to predict labels for unseen test points. In
the second, we use it to repeatedly pseudo-label unlabeled training data in a
federated semi-supervised setting. Experiments on both real federated and
standard benchmark datasets show that in both applications XCLP achieves higher
classification accuracy than alternative approaches.
( 2
min )
In this paper, we study the mistake bound of online kernel learning on a
budget. We propose a new budgeted online kernel learning model, called
Ahpatron, which significantly improves the mistake bound of previous work and
resolves the open problem posed by Dekel, Shalev-Shwartz, and Singer (2005). We
first present an aggressive variant of Perceptron, named AVP, a model without
budget, which uses an active updating rule. Then we design a new budget
maintenance mechanism, which removes a half of examples,and projects the
removed examples onto a hypothesis space spanned by the remaining examples.
Ahpatron adopts the above mechanism to approximate AVP. Theoretical analyses
prove that Ahpatron has tighter mistake bounds, and experimental results show
that Ahpatron outperforms the state-of-the-art algorithms on the same or a
smaller budget.
( 2
min )
We present the first optimal rates for infinite-dimensional vector-valued
ridge regression on a continuous scale of norms that interpolate between $L_2$
and the hypothesis space, which we consider as a vector-valued reproducing
kernel Hilbert space. These rates allow to treat the misspecified case in which
the true regression function is not contained in the hypothesis space. We
combine standard assumptions on the capacity of the hypothesis space with a
novel tensor product construction of vector-valued interpolation spaces in
order to characterize the smoothness of the regression function. Our upper
bound not only attains the same rate as real-valued kernel ridge regression,
but also removes the assumption that the target regression function is bounded.
For the lower bound, we reduce the problem to the scalar setting using a
projection argument. We show that these rates are optimal in most cases and
independent of the dimension of the output space. We illustrate our results for
the special case of vector-valued Sobolev spaces.
( 2
min )
We propose a novel algorithmic framework for distributional reinforcement
learning, based on learning finite-dimensional mean embeddings of return
distributions. We derive several new algorithms for dynamic programming and
temporal-difference learning based on this framework, provide asymptotic
convergence theory, and examine the empirical performance of the algorithms on
a suite of tabular tasks. Further, we show that this approach can be
straightforwardly combined with deep reinforcement learning, and obtain a new
deep RL agent that improves over baseline distributional approaches on the
Arcade Learning Environment.
( 2
min )
We present ELSA, a practical solution for creating deep networks that can
easily be deployed at different levels of sparsity. The core idea is to embed
one or more sparse networks within a single dense network as a proper subset of
the weights. At prediction time, any sparse model can be extracted effortlessly
simply be zeroing out weights according to a predefined mask. ELSA is simple,
powerful and highly flexible. It can use essentially any existing technique for
network sparsification and network training. In particular, it does not
restrict the loss function, architecture or the optimization technique. Our
experiments show that ELSA's advantages of flexible deployment comes with no or
just a negligible reduction in prediction quality compared to the standard way
of using multiple sparse networks that are trained and stored independently.
( 2
min )
This paper presents a novel methodology for improving the performance of
machine learning based space traffic management tasks through the use of a
pre-trained orbit model. Taking inspiration from BERT-like self-supervised
language models in the field of natural language processing, we introduce
ORBERT, and demonstrate the ability of such a model to leverage large
quantities of readily available orbit data to learn meaningful representations
that can be used to aid in downstream tasks. As a proof of concept of this
approach we consider the task of all vs. all conjunction screening, phrased
here as a machine learning time series classification task. We show that
leveraging unlabelled orbit data leads to improved performance, and that the
proposed approach can be particularly beneficial for tasks where the
availability of labelled data is limited.
( 2
min )
We propose a novel algorithmic framework for distributional reinforcement
learning, based on learning finite-dimensional mean embeddings of return
distributions. We derive several new algorithms for dynamic programming and
temporal-difference learning based on this framework, provide asymptotic
convergence theory, and examine the empirical performance of the algorithms on
a suite of tabular tasks. Further, we show that this approach can be
straightforwardly combined with deep reinforcement learning, and obtain a new
deep RL agent that improves over baseline distributional approaches on the
Arcade Learning Environment.
( 2
min )
In this paper, we introduce a novel analysis of neural networks based on
geometric (Clifford) algebra and convex optimization. We show that optimal
weights of deep ReLU neural networks are given by the wedge product of training
samples when trained with standard regularized loss. Furthermore, the training
problem reduces to convex optimization over wedge product features, which
encode the geometric structure of the training dataset. This structure is given
in terms of signed volumes of triangles and parallelotopes generated by data
vectors. The convex problem finds a small subset of samples via $\ell_1$
regularization to discover only relevant wedge product features. Our analysis
provides a novel perspective on the inner workings of deep neural networks and
sheds light on the role of the hidden layers.
( 2
min )
In this paper, we study the mistake bound of online kernel learning on a
budget. We propose a new budgeted online kernel learning model, called
Ahpatron, which significantly improves the mistake bound of previous work and
resolves the open problem posed by Dekel, Shalev-Shwartz, and Singer (2005). We
first present an aggressive variant of Perceptron, named AVP, a model without
budget, which uses an active updating rule. Then we design a new budget
maintenance mechanism, which removes a half of examples,and projects the
removed examples onto a hypothesis space spanned by the remaining examples.
Ahpatron adopts the above mechanism to approximate AVP. Theoretical analyses
prove that Ahpatron has tighter mistake bounds, and experimental results show
that Ahpatron outperforms the state-of-the-art algorithms on the same or a
smaller budget.
( 2
min )
The low-level spatial detail information and high-level semantic abstract
information are both essential to the semantic segmentation task. The features
extracted by the deep network can obtain rich semantic information, while a lot
of spatial information is lost. However, how to recover spatial detail
information effectively and fuse it with high-level semantics has not been well
addressed so far. In this paper, we propose a new architecture based on
Bilateral Segmentation Network (BiseNet) called Multi-scale Covariance Feature
Fusion Network (MCFNet). Specifically, this network introduces a new feature
refinement module and a new feature fusion module. Furthermore, a gating unit
named L-Gate is proposed to filter out invalid information and fuse multi-scale
features. We evaluate our proposed model on Cityscapes, CamVid datasets and
compare it with the state-of-the-art methods. Extensive experiments show that
our method achieves competitive success. On Cityscapes, we achieve 75.5% mIOU
with a speed of 151.3 FPS.
( 2
min )
We present the first optimal rates for infinite-dimensional vector-valued
ridge regression on a continuous scale of norms that interpolate between $L_2$
and the hypothesis space, which we consider as a vector-valued reproducing
kernel Hilbert space. These rates allow to treat the misspecified case in which
the true regression function is not contained in the hypothesis space. We
combine standard assumptions on the capacity of the hypothesis space with a
novel tensor product construction of vector-valued interpolation spaces in
order to characterize the smoothness of the regression function. Our upper
bound not only attains the same rate as real-valued kernel ridge regression,
but also removes the assumption that the target regression function is bounded.
For the lower bound, we reduce the problem to the scalar setting using a
projection argument. We show that these rates are optimal in most cases and
independent of the dimension of the output space. We illustrate our results for
the special case of vector-valued Sobolev spaces.
( 2
min )
In this paper, we provide novel tail bounds on the optimization error of
Stochastic Mirror Descent for convex and Lipschitz objectives. Our analysis
extends the existing tail bounds from the classical light-tailed Sub-Gaussian
noise case to heavier-tailed noise regimes. We study the optimization error of
the last iterate as well as the average of the iterates. We instantiate our
results in two important cases: a class of noise with exponential tails and one
with polynomial tails. A remarkable feature of our results is that they do not
require an upper bound on the diameter of the domain. Finally, we support our
theory with illustrative experiments that compare the behavior of the average
of the iterates with that of the last iterate in heavy-tailed noise regimes.
( 2
min )
The graduate students will aim to commercialize innovations in AI, machine learning, and data science.
( 8
min )
Study shows computational models trained to perform auditory tasks display an internal organization similar to that of the human auditory cortex.
( 9
min )
A new method enables optical devices that more closely match their design specifications, boosting accuracy and efficiency.
( 10
min )
Zipline isn’t just some pie-in-the-sky drone startup. The San Francisco-based company has completed more than 800,000 deliveries in seven countries since its start in 2011. It recently added services for Seattle’s Pagliacci Pizza, vitamin and supplement giant GNC, and large health systems like Intermountain Health, OhioHealth and Michigan Medicine. Zipline developed its drones — which Read article >
( 6
min )
Meeting notes are a crucial part of collaboration, yet they often fall through the cracks. Between leading discussions, listening closely, and typing notes, it’s easy for key information to slip away unrecorded. Even when notes are captured, they can be disorganized or illegible, rendering them useless. In this post, we explore how to use Amazon […]
( 8
min )
In this post, we showcase fine-tuning a Llama 2 model using a Parameter-Efficient Fine-Tuning (PEFT) method and deploy the fine-tuned model on AWS Inferentia2. We use the AWS Neuron software development kit (SDK) to access the AWS Inferentia2 device and benefit from its high performance. We then use a large model inference container powered by […]
( 10
min )
Machine learning (ML) models do not operate in isolation. To deliver value, they must integrate into existing production systems and infrastructure, which necessitates considering the entire ML lifecycle during design and development. ML operations, known as MLOps, focus on streamlining, automating, and monitoring ML models throughout their lifecycle. Building a robust MLOps pipeline demands cross-functional […]
( 13
min )
Axel Springer is the first publishing house globally to partner with us on a deeper integration of journalism in AI technologies.
( 2
min )
In this work, we present Transformer-based Powered Descent Guidance (T-PDG),
a scalable algorithm for reducing the computational complexity of the direct
optimization formulation of the spacecraft powered descent guidance problem.
T-PDG uses data from prior runs of trajectory optimization algorithms to train
a transformer neural network, which accurately predicts the relationship
between problem parameters and the globally optimal solution for the powered
descent guidance problem. The solution is encoded as the set of tight
constraints corresponding to the constrained minimum-cost trajectory and the
optimal final time of landing. By leveraging the attention mechanism of
transformer neural networks, large sequences of time series data can be
accurately predicted when given only the spacecraft state and landing site
parameters. When applied to the real problem of Mars powered descent guidance,
T-PDG reduces the time for computing the 3 degree of freedom fuel-optimal
trajectory, when compared to lossless convexification, from an order of 1-8
seconds to less than 500 milliseconds. A safe and optimal solution is
guaranteed by including a feasibility check in T-PDG before returning the final
trajectory.
( 2
min )
We introduce a curriculum learning algorithm, Variational Automatic
Curriculum Learning (VACL), for solving challenging goal-conditioned
cooperative multi-agent reinforcement learning problems. We motivate our
paradigm through a variational perspective, where the learning objective can be
decomposed into two terms: task learning on the current task distribution, and
curriculum update to a new task distribution. Local optimization over the
second term suggests that the curriculum should gradually expand the training
tasks from easy to hard. Our VACL algorithm implements this variational
paradigm with two practical components, task expansion and entity progression,
which produces training curricula over both the task configurations as well as
the number of entities in the task. Experiment results show that VACL solves a
collection of sparse-reward problems with a large number of agents.
Particularly, using a single desktop machine, VACL achieves 98% coverage rate
with 100 agents in the simple-spread benchmark and reproduces the ramp-use
behavior originally shown in OpenAI's hide-and-seek project. Our project
website is at https://sites.google.com/view/vacl-neurips-2021.
( 2
min )
Multilinear Principal Component Analysis (MPCA) is a widely utilized method
for the dimension reduction of tensor data. However, the integration of MPCA
into federated learning remains unexplored in existing research. To tackle this
gap, this article proposes a Federated Multilinear Principal Component Analysis
(FMPCA) method, which enables multiple users to collaboratively reduce the
dimension of their tensor data while keeping each user's data local and
confidential. The proposed FMPCA method is guaranteed to have the same
performance as traditional MPCA. An application of the proposed FMPCA in
industrial prognostics is also demonstrated. Simulated data and a real-world
data set are used to validate the performance of the proposed method.
( 2
min )
This paper presents a novel algorithm that leverages Stochastic Gradient
Descent strategies in conjunction with Random Features to augment the
scalability of Conic Particle Gradient Descent (CPGD) specifically tailored for
solving sparse optimisation problems on measures. By formulating the CPGD steps
within a variational framework, we provide rigorous mathematical proofs
demonstrating the following key findings: (i) The total variation norms of the
solution measures along the descent trajectory remain bounded, ensuring
stability and preventing undesirable divergence; (ii) We establish a global
convergence guarantee with a convergence rate of
$\mathcal{O}(\log(K)/\sqrt{K})$ over $K$ iterations, showcasing the efficiency
and effectiveness of our algorithm; (iii) Additionally, we analyze and
establish local control over the first-order condition discrepancy,
contributing to a deeper understanding of the algorithm's behavior and
reliability in practical applications.
( 2
min )
Differentiating noisy, discrete measurements in order to fit an ordinary
differential equation can be unreasonably effective. Assuming square-integrable
noise and minimal flow regularity, we construct and analyze a finite-difference
differentiation filter and a Tikhonov-regularized least squares estimator for
the continuous-time parameter-linear system. Combining these contributions in
series, we obtain a finite-sample bound on mean absolute error of estimation.
As a by-product, we offer a novel analysis of stochastically perturbed
Moore-Penrose pseudoinverses.
( 2
min )
To address the bias of the canonical two-way fixed effects estimator for
difference-in-differences under staggered adoptions, Wooldridge (2021) proposed
the extended two-way fixed effects estimator, which adds many parameters.
However, this reduces efficiency. Restricting some of these parameters to be
equal helps, but ad hoc restrictions may reintroduce bias. We propose a machine
learning estimator with a single tuning parameter, fused extended two-way fixed
effects (FETWFE), that enables automatic data-driven selection of these
restrictions. We prove that under an appropriate sparsity assumption FETWFE
identifies the correct restrictions with probability tending to one. We also
prove the consistency, asymptotic normality, and oracle efficiency of FETWFE
for two classes of heterogeneous marginal treatment effect estimators under
either conditional or marginal parallel trends, and we prove consistency for
two classes of conditional average treatment effects under conditional parallel
trends. We demonstrate FETWFE in simulation studies and an empirical
application.
( 2
min )
Phi-2 is now accessible on the Azure model catalog. Its compact size and new innovations in model scaling and training data curation make it ideal for exploration around mechanistic interpretability, safety improvements, and fine-tuning experimentation on a variety of tasks.
The post Phi-2: The surprising power of small language models appeared first on Microsoft Research.
( 11
min )
The launch of ChatGPT and rise in popularity of generative AI have captured the imagination of customers who are curious about how they can use this technology to create new products and services on AWS, such as enterprise chatbots, which are more conversational. This post shows you how you can create a web UI, which […]
( 9
min )
Large language models (or LLMs) have become a topic of daily conversations. Their quick adoption is evident by the amount of time required to reach a 100 million users, which has gone from “4.5yrs by facebook” to an all-time low of mere “2 months by ChatGPT.” A generative pre-trained transformer (GPT) uses causal autoregressive updates […]
( 7
min )
Vodafone is transitioning from a telecommunications company (telco) to a technology company (TechCo) by 2025, with objectives of innovating faster, reducing costs, improving security, and simplifying operations. Thousands of engineers are being onboarded to contribute to this transition. By 2025, Vodafone plans to have 50% of its global workforce actively involved in software development, with […]
( 6
min )
Justin Solomon applies modern geometric techniques to solve problems in computer vision, machine learning, statistics, and beyond.
( 10
min )
The creative team at Moonshine Studio — an artist-focused visual effects (VFX) studio specializing in animation and motion design — was tasked to solve a problem.
( 7
min )
The expressivity of Graph Neural Networks (GNNs) can be entirely
characterized by appropriate fragments of the first-order logic. Namely, any
query of the two variable fragment of graded modal logic (GC2) interpreted over
labeled graphs can be expressed using a GNN whose size depends only on the
depth of the query. As pointed out by [Barcelo & Al., 2020, Grohe, 2021], this
description holds for a family of activation functions, leaving the possibility
for a hierarchy of logics expressible by GNNs depending on the chosen
activation function. In this article, we show that such hierarchy indeed exists
by proving that GC2 queries cannot be expressed by GNNs with polynomial
activation functions. This implies a separation between polynomial and popular
non-polynomial activations (such as Rectified Linear Units) and answers an open
question formulated by [Grohe, 2021].
( 2
min )
In the era of artificial intelligence, data is gold but costly to annotate.
The paper demonstrates a groundbreaking solution to this dilemma using ChatGPT
for text augmentation in sentiment analysis. We leverage ChatGPT's generative
capabilities to create synthetic training data that significantly improves the
performance of smaller models, making them competitive with, or even
outperforming, their larger counterparts. This innovation enables models to be
both efficient and effective, thereby reducing computational cost, inference
time, and memory usage without compromising on quality. Our work marks a key
advancement in the cost-effective development and deployment of robust
sentiment analysis models.
( 2
min )
The Chinese Space Station Telescope (abbreviated as CSST) is a future
advanced space telescope. Real-time identification of galaxy and nebula/star
cluster (abbreviated as NSC) images is of great value during CSST survey. While
recent research on celestial object recognition has progressed, the rapid and
efficient identification of high-resolution local celestial images remains
challenging. In this study, we conducted galaxy and NSC image classification
research using deep learning methods based on data from the Hubble Space
Telescope. We built a Local Celestial Image Dataset and designed a deep
learning model named HR-CelestialNet for classifying images of the galaxy and
NSC. HR-CelestialNet achieved an accuracy of 89.09% on the testing set,
outperforming models such as AlexNet, VGGNet and ResNet, while demonstrating
faster recognition speeds. Furthermore, we investigated the factors influencing
CSST image quality and evaluated the generalization ability of HR-CelestialNet
on the blurry image dataset, demonstrating its robustness to low image quality.
The proposed method can enable real-time identification of celestial images
during CSST survey mission.
( 2
min )
Assurance Cases (ACs) are an established approach in safety engineering to
argue quality claims in a structured way. In the context of quality assurance
for Machine Learning (ML)-based software components, ACs are also being
discussed and appear promising. Tools for operationalizing ACs do exist, yet
mainly focus on supporting safety engineers on the system level. However,
assuring the quality of an ML component within the system is commonly the
responsibility of data scientists, who are usually less familiar with these
tools. To address this gap, we propose a framework to support the
operationalization of ACs for ML components based on technologies that data
scientists use on a daily basis: Python and Jupyter Notebook. Our aim is to
make the process of creating ML-related evidence in ACs more effective. Results
from the application of the framework, documented through notebooks, can be
integrated into existing AC tools. We illustrate the application of the
framework on an example excerpt concerned with the quality of the test data.
( 3
min )
Training generative models to produce synthetic data is meant to provide a
privacy-friendly approach to data release. However, we get robust guarantees
only when models are trained to satisfy Differential Privacy (DP). Alas, this
is not the standard in industry as many companies use ad-hoc strategies to
empirically evaluate privacy based on the statistical similarity between
synthetic and real data. In this paper, we review the privacy metrics offered
by leading companies in this space and shed light on a few critical flaws in
reasoning about privacy entirely via empirical evaluations. We analyze the
undesirable properties of the most popular metrics and filters and demonstrate
their unreliability and inconsistency through counter-examples. We then present
a reconstruction attack, ReconSyn, which successfully recovers (i.e., leaks all
attributes of) at least 78% of the low-density train records (or outliers) with
only black-box access to a single fitted generative model and the privacy
metrics. Finally, we show that applying DP only to the model or using
low-utility generators does not mitigate ReconSyn as the privacy leakage
predominantly comes from the metrics. Overall, our work serves as a warning to
practitioners not to deviate from established privacy-preserving mechanisms.
( 2
min )
Communication networks able to withstand hostile environments are critically
important for disaster relief operations. In this paper, we consider a
challenging scenario where drones have been compromised in the supply chain,
during their manufacture, and harbour malicious software capable of
wide-ranging and infectious disruption. We investigate multi-agent deep
reinforcement learning as a tool for learning defensive strategies that
maximise communications bandwidth despite continual adversarial interference.
Using a public challenge for learning network resilience strategies, we propose
a state-of-the-art expert technique and study its superiority over deep
reinforcement learning agents. Correspondingly, we identify three specific
methods for improving the performance of our learning-based agents: (1)
ensuring each observation contains the necessary information, (2) using expert
agents to provide a curriculum for learning, and (3) paying close attention to
reward. We apply our methods and present a new mixed strategy enabling expert
and learning-based agents to work together and improve on all prior results.
( 2
min )
Can we learn policies in reinforcement learning without rewards? Can we learn
a policy just by trying to reach a goal state? We answer these questions
positively by proposing a multi-step procedure that first learns a world model
that goes backward in time, secondly generates goal-reaching backward
trajectories, thirdly improves those sequences using shortest path finding
algorithms, and finally trains a neural network policy by imitation learning.
We evaluate our method on a deterministic maze environment where the
observations are $64\times 64$ pixel bird's eye images and can show that it
consistently reaches several goals.
( 2
min )
SCGAN adds a similarity constraint between generated images and conditions as
a regularization term on generative adversarial networks. Similarity constraint
works as a tutor to instruct the generator network to comprehend the difference
of representations based on conditions. We understand how SCGAN works on a
deeper level. This understanding makes us realize that the similarity
constraint functions like the contrastive loss function. We believe that a
model with high understanding and intelligence measures the similarity between
images based on their structure and high level features, just like humans do.
Two major changes we applied to SCGAN in order to make a modified model are
using SSIM to measure similarity between images and applying contrastive loss
principles to the similarity constraint. The modified model performs better
using FID and FactorVAE metrics. The modified model also has better
generalisability compared to other models. Keywords Generative Adversarial
Nets, Unsupervised Learning, Disentangled Representation Learning, Contrastive
Disentanglement, SSIM
( 2
min )
The discovery of neural architectures from simple building blocks is a
long-standing goal of Neural Architecture Search (NAS). Hierarchical search
spaces are a promising step towards this goal but lack a unifying search space
design framework and typically only search over some limited aspect of
architectures. In this work, we introduce a unifying search space design
framework based on context-free grammars that can naturally and compactly
generate expressive hierarchical search spaces that are 100s of orders of
magnitude larger than common spaces from the literature. By enhancing and using
their properties, we effectively enable search over the complete architecture
and can foster regularity. Further, we propose an efficient hierarchical kernel
design for a Bayesian Optimization search strategy to efficiently search over
such huge spaces. We demonstrate the versatility of our search space design
framework and show that our search strategy can be superior to existing NAS
approaches. Code is available at
https://github.com/automl/hierarchical_nas_construction.
( 2
min )
We’re proud to have 100+ accepted papers At NeurIPS 2023, plus 18 workshops. Several submissions were chosen as oral presentations and spotlight posters, reflecting groundbreaking concepts, methods, or applications. Here’s an overview of those submissions.
The post NeurIPS 2023 highlights breadth of Microsoft’s machine learning innovation appeared first on Microsoft Research.
( 16
min )
The series aims to help policymakers create better oversight of AI in society.
( 12
min )
In today’s digital marketing world, things are changing fast, and artificial intelligence (AI) is a big part of that. Companies want to stay ahead, so they’re smartly choosing to get help from outside experts in digital marketing who use AI tools. This helps them make the most of what AI can do. AI is like… Read More »Maximizing marketing potential: The AI-driven revolution in outsourced digital marketing
The post Maximizing marketing potential: The AI-driven revolution in outsourced digital marketing appeared first on Data Science Central.
( 22
min )
Much has been said about the economic impact of AGI, some of it is already been feltBut not much has been proposed about solutionsSpecifically, what approaches should policy makers take? Here, I propose that policy makers should encourage two key trends – together which could alleviate the issues of AI – The Gig economy and… Read More »Universal basic income and the gig economy: A combined policy approach to alleviate the challenges of AI
The post Universal basic income and the gig economy: A combined policy approach to alleviate the challenges of AI appeared first on Data Science Central.
( 21
min )
Great companies thrive on stories. Sid Siddeek, who runs NVIDIA’s venture capital arm, knows this well. Siddeek still remembers one of his first jobs, schlepping presentation materials from one investor meeting to another, helping the startup’s CEO and management team get the story out while working from a trailer that “shook when the door opened,” Read article >
( 7
min )
In part 1 of the series “A Different AI Scenario: AI and Justice in a Brave New World,” we outlined some requirements for the role that AI would play in enforcing our laws and regulations in a more just and fair manner and what our human legislators must do to ensure those more just and… Read More »AI and Justice in a Brave New World Part 2 – Humanizing AI
The post AI and Justice in a Brave New World Part 2 – Humanizing AI appeared first on Data Science Central.
( 22
min )
Finding classifiers robust to adversarial examples is critical for their safe
deployment. Determining the robustness of the best possible classifier under a
given threat model for a given data distribution and comparing it to that
achieved by state-of-the-art training methods is thus an important diagnostic
tool. In this paper, we find achievable information-theoretic lower bounds on
loss in the presence of a test-time attacker for multi-class classifiers on any
discrete dataset. We provide a general framework for finding the optimal 0-1
loss that revolves around the construction of a conflict hypergraph from the
data and adversarial constraints. We further define other variants of the
attacker-classifier game that determine the range of the optimal loss more
efficiently than the full-fledged hypergraph construction. Our evaluation
shows, for the first time, an analysis of the gap to optimal robustness for
classifiers in the multi-class setting on benchmark datasets.
( 2
min )
We explore colour versus shape goal misgeneralization originally demonstrated
by Di Langosco et al. (2022) in the Procgen Maze environment, where, given an
ambiguous choice, the agents seem to prefer generalization based on colour
rather than shape. After training over 1,000 agents in a simplified version of
the environment and evaluating them on over 10 million episodes, we conclude
that the behaviour can be attributed to the agents learning to detect the goal
object through a specific colour channel. This choice is arbitrary.
Additionally, we show how, due to underspecification, the preferences can
change when retraining the agents using exactly the same procedure except for
using a different random seed for the training run. Finally, we demonstrate the
existence of outliers in out-of-distribution behaviour based on training random
seed alone.
( 2
min )
The Classification Tree (CT) is one of the most common models in
interpretable machine learning. Although such models are usually built with
greedy strategies, in recent years, thanks to remarkable advances in
Mixer-Integer Programming (MIP) solvers, several exact formulations of the
learning problem have been developed. In this paper, we argue that some of the
most relevant ones among these training models can be encapsulated within a
general framework, whose instances are shaped by the specification of loss
functions and regularizers. Next, we introduce a novel realization of this
framework: specifically, we consider the logistic loss, handled in the MIP
setting by a linear piece-wise approximation, and couple it with
$\ell_1$-regularization terms. The resulting Optimal Logistic Tree model
numerically proves to be able to induce trees with enhanced interpretability
features and competitive generalization capabilities, compared to the
state-of-the-art MIP-based approaches.
( 2
min )
We report the effects of replacing the scaled dot-product (within softmax)
attention with the negative-log of Euclidean distance. This form of attention
simplifies to inverse distance weighting interpolation. Used in simple one
hidden layer networks and trained with vanilla cross-entropy loss on
classification problems, it tends to produce a key matrix containing prototypes
and a value matrix with corresponding logits. We also show that the resulting
interpretable networks can be augmented with manually-constructed prototypes to
perform low-impact handling of special cases.
( 2
min )
In this paper, we study the method to reconstruct dynamical systems from data
without time labels. Data without time labels appear in many applications, such
as molecular dynamics, single-cell RNA sequencing etc. Reconstruction of
dynamical system from time sequence data has been studied extensively. However,
these methods do not apply if time labels are unknown. Without time labels,
sequence data becomes distribution data. Based on this observation, we propose
to treat the data as samples from a probability distribution and try to
reconstruct the underlying dynamical system by minimizing the distribution
loss, sliced Wasserstein distance more specifically. Extensive experiment
results demonstrate the effectiveness of the proposed method.
( 2
min )
Sentiment analysis of social media data is an emerging field with vast
applications in various domains. In this study, we developed a sentiment
analysis model to analyze social media sentiment, especially tweets, during
global conflicting scenarios. To establish our research experiment, we
identified a recent global dispute incident on Twitter and collected around
31,000 filtered Tweets for several months to analyze human sentiment worldwide.
( 2
min )
A simple graph on $n$ vertices may contain a lot of maximum cliques. But how
many can it potentially contain? We will define prime and composite graphs, and
we will show that if $n \ge 15$, then the grpahs with the maximum number of
maximum cliques have to be composite. Moreover, we will show an edge bound from
which we will prove that if any factor of a composite graph has $\omega(G_i)
\ge 5$, then it cannot have the maximum number of maximum cliques. Using this
we will show that the graph that contains $3^{\lfloor n/3 \rfloor}c$ maximum
cliques has the most number of maximum cliques on $n$ vertices, where
$c\in\{1,\frac{4}{3},2\}$, depending on $n \text{ mod } 3$.
( 2
min )
We define and study a fully-convolutional neural network stochastic model,
NN-Turb, which generates a 1-dimensional field with some turbulent velocity
statistics. In particular, the generated process satisfies the Kolmogorov 2/3
law for second order structure function. It also presents negative skewness
across scales (i.e. Kolmogorov 4/5 law) and exhibits intermittency as
characterized by skewness and flatness. Furthermore, our model is never in
contact with turbulent data and only needs the desired statistical behavior of
the structure functions across scales for training.
( 2
min )
Multi-distribution learning generalizes the classic PAC learning to handle
data coming from multiple distributions. Given a set of $k$ data distributions
and a hypothesis class of VC dimension $d$, the goal is to learn a hypothesis
that minimizes the maximum population loss over $k$ distributions, up to
$\epsilon$ additive error. In this paper, we settle the sample complexity of
multi-distribution learning by giving an algorithm of sample complexity
$\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$. This matches the
lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem
of Awasthi, Haghtalab and Zhao [AHZ23].
( 2
min )
Reliable uncertainty quantification (UQ) in machine learning (ML) regression
tasks is becoming the focus of many studies in materials and chemical science.
It is now well understood that average calibration is insufficient, and most
studies implement additional methods testing the conditional calibration with
respect to uncertainty, i.e. consistency. Consistency is assessed mostly by
so-called reliability diagrams. There exists however another way beyond average
calibration, which is conditional calibration with respect to input features,
i.e. adaptivity. In practice, adaptivity is the main concern of the final users
of a ML-UQ method, seeking for the reliability of predictions and uncertainties
for any point in features space. This article aims to show that consistency and
adaptivity are complementary validation targets, and that a good consistency
does not imply a good adaptivity. Adapted validation methods are proposed and
illustrated on a representative example.
( 2
min )
We present a performant, general-purpose gradient-guided nested sampling
algorithm, ${\tt GGNS}$, combining the state of the art in differentiable
programming, Hamiltonian slice sampling, clustering, mode separation, dynamic
nested sampling, and parallelization. This unique combination allows ${\tt
GGNS}$ to scale well with dimensionality and perform competitively on a variety
of synthetic and real-world problems. We also show the potential of combining
nested sampling with generative flow networks to obtain large amounts of
high-quality samples from the posterior distribution. This combination leads to
faster mode discovery and more accurate estimates of the partition function.
( 2
min )
To tackle long planning horizon problems in reinforcement learning with
general function approximation, we propose the first algorithm, termed as
UCRL-WVTR, that achieves both \emph{horizon-free} and
\emph{instance-dependent}, since it eliminates the polynomial dependency on the
planning horizon. The derived regret bound is deemed \emph{sharp}, as it
matches the minimax lower bound when specialized to linear mixture MDPs up to
logarithmic factors. Furthermore, UCRL-WVTR is \emph{computationally efficient}
with access to a regression oracle. The achievement of such a horizon-free,
instance-dependent, and sharp regret bound hinges upon (i) novel algorithm
designs: weighted value-targeted regression and a high-order moment estimator
in the context of general function approximation; and (ii) fine-grained
analyses: a novel concentration bound of weighted non-linear least squares and
a refined analysis which leads to the tight instance-dependent bound. We also
conduct comprehensive experiments to corroborate our theoretical findings.
( 2
min )
In the era of fast-paced precision medicine, observational studies play a
major role in properly evaluating new treatments in clinical practice. Yet,
unobserved confounding can significantly compromise causal conclusions drawn
from non-randomized data. We propose a novel strategy that leverages randomized
trials to quantify unobserved confounding. First, we design a statistical test
to detect unobserved confounding with strength above a given threshold. Then,
we use the test to estimate an asymptotically valid lower bound on the
unobserved confounding strength. We evaluate the power and validity of our
statistical test on several synthetic and semi-synthetic datasets. Further, we
show how our lower bound can correctly identify the absence and presence of
unobserved confounding in a real-world setting.
( 2
min )
Inventory management is crucial for businesses, but it can be tedious. It can make or break a business, regardless of its age. AI has revolutionized business management and inventory control. AI can now do more than just follow instructions. It can analyze inventory history, predict customer behavior, and anticipate business needs. Want to know what… Read More »Harness the power of an AI-powered forecasting model to revitalize your business
The post Harness the power of an AI-powered forecasting model to revitalize your business appeared first on Data Science Central.
( 26
min )
Between the two of them, ChatGPT4 can generate the lyrics to Christmas carols, and DALL-E3 can illustrate them!
Throw your old carol books away because this is the only guide you'll need.
12 Days of Christmas
"Please generate an illustration where each of the 12 days'
( 3
min )
AI Weirdness: the strange side of machine learning
( 2
min )
MIT researchers develop a customized onboarding process that helps a human learn when a model’s advice is trustworthy.
( 11
min )
We introduce SwiftSage, a novel agent framework inspired by the dual-process
theory of human cognition, designed to excel in action planning for complex
interactive reasoning tasks. SwiftSage integrates the strengths of behavior
cloning and prompting large language models (LLMs) to enhance task completion
performance. The framework comprises two primary modules: the Swift module,
representing fast and intuitive thinking, and the Sage module, emulating
deliberate thought processes. The Swift module is a small encoder-decoder LM
fine-tuned on the oracle agent's action trajectories, while the Sage module
employs LLMs such as GPT-4 for subgoal planning and grounding. We develop a
heuristic method to harmoniously integrate the two modules, resulting in a more
efficient and robust problem-solving process. In 30 tasks from the ScienceWorld
benchmark, SwiftSage significantly outperforms other methods such as SayCan,
ReAct, and Reflexion, demonstrating its effectiveness in solving complex
interactive tasks.
( 2
min )
Maintenance work orders are commonly used to document information about wind
turbine operation and maintenance. This includes details about proactive and
reactive wind turbine downtimes, such as preventative and corrective
maintenance. However, the information contained in maintenance work orders is
often unstructured and difficult to analyze, presenting challenges for
decision-makers wishing to use it for optimizing operation and maintenance. To
address this issue, this work compares three different approaches to calculate
reliability by performance indicators from maintenance work orders. The first
approach involves manual labeling of the maintenance work orders by domain
experts, using the schema defined in an industrial guideline to assign the
label accordingly. The second approach involves the development of a model that
automatically labels the maintenance work orders using text classification
methods. Through this method, we are able to achieve macro average and weighted
average F1-Scores of 0.75 and 0.85 respectively. The third technique uses an
AI-assisted tagging tool to tag and structure the raw maintenance information,
together with a novel rule-based approach for extracting relevant maintenance
work orders for failure rate calculation. In our experiments the AI-assisted
tool leads to a 88% drop in tagging time in comparison to the other two
approaches, while expert labeling and text classification are more accurate in
KPI extraction. Overall, our findings make extracting maintenance information
from maintenance work orders more efficient, enable the assessment of
reliability key performance indicators and therefore support the optimization
of wind turbine operation and maintenance.
( 3
min )
Physics informed neural networks (PINNs) have recently been widely used for
robust and accurate approximation of PDEs. We provide rigorous upper bounds on
the generalization error of PINNs approximating solutions of the forward
problem for PDEs. An abstract formalism is introduced and stability properties
of the underlying PDE are leveraged to derive an estimate for the
generalization error in terms of the training error and number of training
samples. This abstract framework is illustrated with several examples of
nonlinear PDEs. Numerical experiments, validating the proposed theory, are also
presented.
( 2
min )
Recent advances in language models (LMs), have demonstrated significant
efficacy in tasks related to the arts and humanities. While LMs have exhibited
exceptional performance across a wide range of natural language processing
tasks, there are notable challenges associated with their utilization on small
datasets and their ability to replicate more creative human capacities. In this
study, we aim to address these challenges by training a Persian classical
poetry generation model using a transformer architecture on a specialized
dataset with no pretraining. Additionally, we propose a novel decoding method
to enhance coherence and meaningfulness in the generated poetry, effectively
managing the tradeoff between diversity and quality. Furthermore, the results
of our training approach and the proposed decoding method are evaluated through
comprehensive set of automatic and human evaluations and showed its superior
capability to generate coherent and meaningful poetry in compare to other
decoding methods and an existing Persian large language model (LLM).
( 2
min )
Knowledge graph construction (KGC) is a multifaceted undertaking involving
the extraction of entities, relations, and events. Traditionally, large
language models (LLMs) have been viewed as solitary task-solving agents in this
complex landscape. However, this paper challenges this paradigm by introducing
a novel framework, CooperKGC. Departing from the conventional approach,
CooperKGC establishes a collaborative processing network, assembling a KGC
collaboration team capable of concurrently addressing entity, relation, and
event extraction tasks. Our experiments unequivocally demonstrate that
fostering collaboration and information interaction among diverse agents within
CooperKGC yields superior results compared to individual cognitive processes
operating in isolation. Importantly, our findings reveal that the collaboration
facilitated by CooperKGC enhances knowledge selection, correction, and
aggregation capabilities across multiple rounds of interactions.
( 2
min )
Recent research on online Gradient Balancing (GraB) has revealed that there
exist permutation-based example orderings for SGD that are guaranteed to
outperform random reshuffling (RR). Whereas RR arbitrarily permutes training
examples, GraB leverages stale gradients from prior epochs to order examples --
achieving a provably faster convergence rate than RR. However, GraB is limited
by design: while it demonstrates an impressive ability to scale-up training on
centralized data, it does not naturally extend to modern distributed ML
workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which
uses insights from prior work on kernel thinning to translate the benefits of
provably faster permutation-based example ordering to distributed settings.
With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate
over centralized GraB and outperforms distributed RR on a variety of benchmark
tasks.
( 2
min )
When optimizing problems with uncertain parameter values in a linear
objective, decision-focused learning enables end-to-end learning of these
values. We are interested in a stochastic scheduling problem, in which
processing times are uncertain, which brings uncertain values in the
constraints, and thus repair of an initial schedule may be needed. Historical
realizations of the stochastic processing times are available. We show how
existing decision-focused learning techniques based on stochastic smoothing can
be adapted to this scheduling problem. We include an extensive experimental
evaluation to investigate in which situations decision-focused learning
outperforms the state of the art for such situations: scenario-based stochastic
optimization.
( 2
min )
Among the commonly used non-destructive techniques, the Ground Penetrating
Radar (GPR) is one of the most widely adopted today for assessing pavement
conditions in France. However, conventional radar systems and their forward
processing methods have shown their limitations for the physical and
geometrical characterization of very thin layers such as tack coats. However,
the use of Machine Learning methods applied to GPR with an inverse approach
showed that it was numerically possible to identify the tack coat
characteristics despite masking effects due to low timefrequency resolution
noted in the raw B-scans. Thus, we propose in this paper to apply the inverse
approach based on Machine Learning, already validated in previous works on
numerical data, on two experimental cases with different pavement structures.
The first case corresponds to a validation on known pavement structures on the
Gustave Eiffel University (Nantes, France) with its pavement fatigue carousel
and the second case focuses on a new real road in Vend{\'e}e department
(France). In both case studies, the performances of SVM/SVR methods showed the
efficiency of supervised learning methods to classify and estimate the emulsion
proportioning in the tack coats.
( 3
min )
This research introduces a sophisticated transfer learning model based on
Google's MobileNetV2 for breast cancer tumor classification into normal,
benign, and malignant categories, utilizing a dataset of 1576 ultrasound images
(265 normal, 891 benign, 420 malignant). The model achieves an accuracy of
0.82, precision of 0.83, recall of 0.81, ROC-AUC of 0.94, PR-AUC of 0.88, and
MCC of 0.74. It examines image intensity distributions and misclassification
errors, offering improvements for future applications. Addressing dataset
imbalances, the study ensures a generalizable model. This work, using a dataset
from Baheya Hospital, Cairo, Egypt, compiled by Walid Al-Dhabyani et al.,
emphasizes MobileNetV2's potential in medical imaging, aiming to improve
diagnostic precision in oncology. Additionally, the paper explores
Streamlit-based deployment for real-time tumor classification, demonstrating
MobileNetV2's applicability in medical imaging and setting a benchmark for
future research in oncology diagnostics.
( 2
min )
We study the asymptotic generalization of an overparameterized linear model
for multiclass classification under the Gaussian covariates bi-level model
introduced in Subramanian et al.~'22, where the number of data points,
features, and classes all grow together. We fully resolve the conjecture posed
in Subramanian et al.~'22, matching the predicted regimes for generalization.
Furthermore, our new lower bounds are akin to an information-theoretic strong
converse: they establish that the misclassification rate goes to 0 or 1
asymptotically. One surprising consequence of our tight results is that the
min-norm interpolating classifier can be asymptotically suboptimal relative to
noninterpolating classifiers in the regime where the min-norm interpolating
regressor is known to be optimal.
The key to our tight analysis is a new variant of the Hanson-Wright
inequality which is broadly useful for multiclass problems with sparse labels.
As an application, we show that the same type of analysis can be used to
analyze the related multilabel classification problem under the same bi-level
ensemble.
( 2
min )
Recent advances in machine learning, specifically transformer architecture,
have led to significant advancements in commercial domains. These powerful
models have demonstrated superior capability to learn complex relationships and
often generalize better to new data and problems. This paper presents a novel
transformer-powered approach for enhancing prediction accuracy in multi-modal
output scenarios, where sparse experimental data is supplemented with
simulation data. The proposed approach integrates transformer-based
architecture with a novel graph-based hyper-parameter optimization technique.
The resulting system not only effectively reduces simulation bias, but also
achieves superior prediction accuracy compared to the prior method. We
demonstrate the efficacy of our approach on inertial confinement fusion
experiments, where only 10 shots of real-world data are available, as well as
synthetic versions of these experiments.
( 2
min )
This paper engages in a speculative exploration of the concept of an
artificial agent capable of conducting research. Initially, it examines how the
act of research can be conceptually characterized, aiming to provide a starting
point for discussions about what it means to create such agents. The focus then
shifts to the core components of research: question formulation, hypothesis
generation, and hypothesis verification. This discussion includes a
consideration of the potential and challenges associated with enabling machines
to autonomously perform these tasks. Subsequently, this paper briefly considers
the overlapping themes and interconnections that underlie them. Finally, the
paper presents preliminary thoughts on prototyping as an initial step towards
uncovering the challenges involved in developing these research-capable agents.
( 2
min )
In this paper, we propose a dimensionless anomaly detection method for
multivariate streams. Our method is independent of the unit of measurement for
the different stream channels, therefore dimensionless. We first propose the
variance norm, a generalisation of Mahalanobis distance to handle
infinite-dimensional feature space and singular empirical covariance matrix
rigorously. We then combine the variance norm with the path signature, an
infinite collection of iterated integrals that provide global features of
streams, to propose SigMahaKNN, a method for anomaly detection on
(multivariate) streams. We show that SigMahaKNN is invariant to stream
reparametrisation, stream concatenation and has a graded discrimination power
depending on the truncation level of the path signature. We implement
SigMahaKNN as an open-source software, and perform extensive numerical
experiments, showing significantly improved anomaly detection on streams
compared to isolation forest and local outlier factors in applications ranging
from language analysis, hand-writing analysis, ship movement paths analysis and
univariate time-series analysis.
( 2
min )
Algorithms make a growing portion of policy and business decisions. We
develop a treatment-effect estimator using algorithmic decisions as instruments
for a class of stochastic and deterministic algorithms. Our estimator is
consistent and asymptotically normal for well-defined causal effects. A special
case of our setup is multidimensional regression discontinuity designs with
complex boundaries. We apply our estimator to evaluate the Coronavirus Aid,
Relief, and Economic Security Act, which allocated many billions of dollars
worth of relief funding to hospitals via an algorithmic rule. The funding is
shown to have little effect on COVID-19-related hospital activities. Naive
estimates exhibit selection bias.
( 2
min )
There has been a lot of work in question generation where different methods
to provide target answers as input, have been employed. This experimentation
has been mostly carried out for RNN based models. We use three different
methods and their combinations for incorporating answer information and explore
their effect on several automatic evaluation metrics. The methods that are used
are answer prompting, using a custom product method using answer embeddings and
encoder outputs, choosing sentences from the input paragraph that have answer
related information, and using a separate cross-attention attention block in
the decoder which attends to the answer. We observe that answer prompting
without any additional modes obtains the best scores across rouge, meteor
scores. Additionally, we use a custom metric to calculate how many of the
generated questions have the same answer, as the answer which is used to
generate them.
( 2
min )
We present a robust membership inference attack (RMIA) that amplifies the
distinction between population data and the training data on any target model,
by effectively leveraging both reference models and reference data in our
likelihood ratio test. Our algorithm exhibits superior test power
(true-positive rate) when compared to prior methods, even at extremely low
false-positive error rates (as low as 0). Also, under computation constraints,
where only a limited number of reference models (as few as 1) are available,
our method performs exceptionally well, unlike some prior attacks that approach
random guessing in such scenarios. Our method lays the groundwork for
cost-effective and practical yet powerful and robust privacy risk analysis of
machine learning algorithms.
( 2
min )
Among the commonly used non-destructive techniques, the Ground Penetrating
Radar (GPR) is one of the most widely adopted today for assessing pavement
conditions in France. However, conventional radar systems and their forward
processing methods have shown their limitations for the physical and
geometrical characterization of very thin layers such as tack coats. However,
the use of Machine Learning methods applied to GPR with an inverse approach
showed that it was numerically possible to identify the tack coat
characteristics despite masking effects due to low timefrequency resolution
noted in the raw B-scans. Thus, we propose in this paper to apply the inverse
approach based on Machine Learning, already validated in previous works on
numerical data, on two experimental cases with different pavement structures.
The first case corresponds to a validation on known pavement structures on the
Gustave Eiffel University (Nantes, France) with its pavement fatigue carousel
and the second case focuses on a new real road in Vend{\'e}e department
(France). In both case studies, the performances of SVM/SVR methods showed the
efficiency of supervised learning methods to classify and estimate the emulsion
proportioning in the tack coats.
( 3
min )
Algorithms make a growing portion of policy and business decisions. We
develop a treatment-effect estimator using algorithmic decisions as instruments
for a class of stochastic and deterministic algorithms. Our estimator is
consistent and asymptotically normal for well-defined causal effects. A special
case of our setup is multidimensional regression discontinuity designs with
complex boundaries. We apply our estimator to evaluate the Coronavirus Aid,
Relief, and Economic Security Act, which allocated many billions of dollars
worth of relief funding to hospitals via an algorithmic rule. The funding is
shown to have little effect on COVID-19-related hospital activities. Naive
estimates exhibit selection bias.
( 2
min )
In this paper, we propose a dimensionless anomaly detection method for
multivariate streams. Our method is independent of the unit of measurement for
the different stream channels, therefore dimensionless. We first propose the
variance norm, a generalisation of Mahalanobis distance to handle
infinite-dimensional feature space and singular empirical covariance matrix
rigorously. We then combine the variance norm with the path signature, an
infinite collection of iterated integrals that provide global features of
streams, to propose SigMahaKNN, a method for anomaly detection on
(multivariate) streams. We show that SigMahaKNN is invariant to stream
reparametrisation, stream concatenation and has a graded discrimination power
depending on the truncation level of the path signature. We implement
SigMahaKNN as an open-source software, and perform extensive numerical
experiments, showing significantly improved anomaly detection on streams
compared to isolation forest and local outlier factors in applications ranging
from language analysis, hand-writing analysis, ship movement paths analysis and
univariate time-series analysis.
( 2
min )
In causal models, a given mechanism is assumed to be invariant to changes of
other mechanisms. While this principle has been utilized for inference in
settings where the causal variables are observed, theoretical insights when the
variables of interest are latent are largely missing. We assay the connection
between invariance and causal representation learning by establishing
impossibility results which show that invariance alone is insufficient to
identify latent causal variables. Together with practical considerations, we
use these theoretical findings to highlight the need for additional constraints
in order to identify representations by exploiting invariance.
( 2
min )
Associated to each graph G is a Gaussian graphical model. Such models are
often used in high-dimensional settings, i.e. where there are relatively few
data points compared to the number of variables. The maximum likelihood
threshold of a graph is the minimum number of data points required to fit the
corresponding graphical model using maximum likelihood estimation. Graphical
lasso is a method for selecting and fitting a graphical model. In this project,
we ask: when graphical lasso is used to select and fit a graphical model on n
data points, how likely is it that n is greater than or equal to the maximum
likelihood threshold of the corresponding graph? Our results are a series of
computational experiments.
( 2
min )
The partially observable constrained optimization problems (POCOPs) impede
data-driven optimization techniques since an infeasible solution of POCOPs can
provide little information about the objective as well as the constraints. We
endeavor to design an efficient and provable method for expensive POCOPs under
the framework of constrained Bayesian optimization. Our method consists of two
key components. Firstly, we present an improved design of the acquisition
functions that introduces balanced exploration during optimization. We
rigorously study the convergence properties of this design to demonstrate its
effectiveness. Secondly, we propose a Gaussian process embedding different
likelihoods as the surrogate model for a partially observable constraint. This
model leads to a more accurate representation of the feasible regions compared
to traditional classification-based models. Our proposed method is empirically
studied on both synthetic and real-world problems. The results demonstrate the
competitiveness of our method for solving POCOPs.
( 2
min )
The central problem in materials science is to discover materials with desired properties. MatterGen enables broad property-guided materials design.
The post MatterGen: Property-guided materials design appeared first on Microsoft Research.
( 8
min )
Advanced prompting technologies for LLMs can lead to excessively long prompts, causing issues. Learn how LLMLingua compresses prompts up to 20x, maintaining quality, reducing latency, and supporting improved UX.
The post LLMLingua: Innovating LLM efficiency with prompt compression appeared first on Microsoft Research.
( 10
min )
Accessibility is a key element that all designers must consider before constructing a space or product — but the evaluation process has traditionally been tedious and time-consuming. Mathew Schwartz, an assistant professor in architecture and design at the New Jersey Institute of Technology, is using the NVIDIA Omniverse platform and the Universal Scene Description framework, Read article >
( 7
min )
It’s a fortuitous GFN Thursday with 17 new games joining the GeForce NOW library, including The Day Before, Avatar: Frontiers of Pandora and the 100th PC Game Pass title to join the cloud — Ori and the Will of the Wisps. This week also marks a milestone: over 500 games and applications now support RTX Read article >
( 8
min )
This is a guest post co-authored by Nafi Ahmet Turgut, Mehmet İkbal Özmen, Hasan Burak Yel, Fatma Nur Dumlupınar Keşir, Mutlu Polatcan and Emre Uzel from Getir. Getir is the pioneer of ultrafast grocery delivery. The technology company has revolutionized last-mile delivery with its grocery in-minutes delivery proposition. Getir was founded in 2015 and operates […]
( 8
min )
Using machine learning, the computational method can provide details of how materials work as catalysts, semiconductors, or battery components.
( 11
min )
Double descent presents a counter-intuitive aspect within the machine
learning domain, and researchers have observed its manifestation in various
models and tasks. While some theoretical explanations have been proposed for
this phenomenon in specific contexts, an accepted theory to account for its
occurrence in deep learning remains yet to be established. In this study, we
revisit the phenomenon of double descent and demonstrate that its occurrence is
strongly influenced by the presence of noisy data. Through conducting a
comprehensive analysis of the feature space of learned representations, we
unveil that double descent arises in imperfect models trained with noisy data.
We argue that double descent is a consequence of the model first learning the
noisy data until interpolation and then adding implicit regularization via
over-parameterization acquiring therefore capability to separate the
information from the noise.
( 2
min )
Adopting reasonable strategies is challenging but crucial for an intelligent
agent with limited resources working in hazardous, unstructured, and dynamic
environments to improve the system's utility, decrease the overall cost, and
increase mission success probability. This paper proposes a novel directed
acyclic strategy graph decomposition approach based on Bayesian chaining to
separate an intricate policy into several simple sub-policies and organize
their relationships as Bayesian strategy networks (BSN). We integrate this
approach into the state-of-the-art DRL method -- soft actor-critic (SAC), and
build the corresponding Bayesian soft actor-critic (BSAC) model by organizing
several sub-policies as a joint policy. We compare our method against the
state-of-the-art deep reinforcement learning algorithms on the standard
continuous control benchmarks in the OpenAI Gym environment. The results
demonstrate that the promising potential of the BSAC method significantly
improves training efficiency.
( 2
min )
Computational pathology models rarely utilise data that will not be available
for inference. This means most models cannot learn from highly informative data
such as additional immunohistochemical (IHC) stains and spatial
transcriptomics. We present TriDeNT, a novel self-supervised method for
utilising privileged data that is not available during inference to improve
performance. We demonstrate the efficacy of this method for a range of
different paired data including immunohistochemistry, spatial transcriptomics
and expert nuclei annotations. In all settings, TriDeNT outperforms other
state-of-the-art methods in downstream tasks, with observed improvements of up
to 101%. Furthermore, we provide qualitative and quantitative measurements of
the features learned by these models and how they differ from baselines.
TriDeNT offers a novel method to distil knowledge from scarce or costly data
during training, to create significantly better models for routine inputs.
( 2
min )
Guaranteeing safe behaviour of reinforcement learning (RL) policies poses
significant challenges for safety-critical applications, despite RL's
generality and scalability. To address this, we propose a new approach to apply
verification methods from control theory to learned value functions. By
analyzing task structures for safety preservation, we formalize original
theorems that establish links between value functions and control barrier
functions. Further, we propose novel metrics for verifying value functions in
safe control tasks and practical implementation details to improve learning.
Our work presents a novel method for certificate learning, which unlocks a
diversity of verification techniques from control theory for RL policies, and
marks a significant step towards a formal framework for the general, scalable,
and verifiable design of RL-based control systems. Code and videos are
available at this https url: https://rl-cbf.github.io/
( 2
min )
Physics-informed neural networks (PINNs) constitute a flexible approach to
both finding solutions and identifying parameters of partial differential
equations. Most works on the topic assume noiseless data, or data contaminated
with weak Gaussian noise. We show that the standard PINN framework breaks down
in case of non-Gaussian noise. We give a way of resolving this fundamental
issue and we propose to jointly train an energy-based model (EBM) to learn the
correct noise distribution. We illustrate the improved performance of our
approach using multiple examples.
( 2
min )
In this paper, we prove that an Adam-type algorithm with smooth clipping
approaches the global minimizer of the regularized non-convex loss function.
Adding smooth clipping and taking the state space as the set of all
trajectories, we can apply the ergodic theory of Markov semigroups for this
algorithm and investigate its asymptotic behavior. The ergodic theory we
establish in this paper reduces the problem of evaluating the convergence,
generalization error and discretization error of this algorithm to the problem
of evaluating the difference between two functional stochastic differential
equations (SDEs) with different drift coefficients. As a result of our
analysis, we have shown that this algorithm minimizes the the regularized
non-convex loss function with errors of the form $n^{-1/2}$, $\eta^{1/4}$,
$\beta^{-1} \log (\beta + 1)$ and $e^{- c t}$. Here, $c$ is a constant and $n$,
$\eta$, $\beta$ and $t$ denote the size of the training dataset, learning rate,
inverse temperature and time, respectively.
( 2
min )
Knowledge tracing consists in predicting the performance of some students on
new questions given their performance on previous questions, and can be a prior
step to optimizing assessment and learning. Deep knowledge tracing (DKT) is a
competitive model for knowledge tracing relying on recurrent neural networks,
even if some simpler models may match its performance. However, little is known
about why DKT works so well. In this paper, we frame deep knowledge tracing as
a encoderdecoder architecture. This viewpoint not only allows us to propose
better models in terms of performance, simplicity or expressivity but also
opens up promising avenues for future research directions. In particular, we
show on several small and large datasets that a simpler decoder, with possibly
fewer parameters than the one used by DKT, can predict student performance
better.
( 2
min )
Deep Learning(DL) and Machine Learning(ML) applications are rapidly
increasing in recent days. Massive amounts of data are being generated over the
internet which can derive meaningful results by the use of ML and DL
algorithms. Hardware resources and open-source libraries have made it easy to
implement these algorithms. Tensorflow and Pytorch are one of the leading
frameworks for implementing ML projects. By using those frameworks, we can
trace the operations executed on both GPU and CPU to analyze the resource
allocations and consumption. This paper presents the time and memory allocation
of CPU and GPU while training deep neural networks using Pytorch. This paper
analysis shows that GPU has a lower running time as compared to CPU for deep
neural networks. For a simpler network, there are not many significant
improvements in GPU over the CPU.
( 2
min )
The effectiveness of a model is heavily reliant on the quality of the fusion
representation of multiple modalities in multimodal sentiment analysis.
Moreover, each modality is extracted from raw input and integrated with the
rest to construct a multimodal representation. Although previous methods have
proposed multimodal representations and achieved promising results, most of
them focus on forming positive and negative pairs, neglecting the variation in
sentiment scores within the same class. Additionally, they fail to capture the
significance of unimodal representations in the fusion vector. To address these
limitations, we introduce a framework called Supervised Angular-based
Contrastive Learning for Multimodal Sentiment Analysis. This framework aims to
enhance discrimination and generalizability of the multimodal representation
and overcome biases in the fusion vector's modality. Our experimental results,
along with visualizations on two widely used datasets, demonstrate the
effectiveness of our approach.
( 2
min )
We discuss the fundamental issue of identification in linear instrumental
variable (IV) models with unknown IV validity. With the assumption of the
"sparsest rule", which is equivalent to the plurality rule but becomes
operational in computation algorithms, we investigate and prove the advantages
of non-convex penalized approaches over other IV estimators based on two-step
selections, in terms of selection consistency and accommodation for
individually weak IVs. Furthermore, we propose a surrogate sparsest penalty
that aligns with the identification condition and provides oracle sparse
structure simultaneously. Desirable theoretical properties are derived for the
proposed estimator with weaker IV strength conditions compared to the previous
literature. Finite sample properties are demonstrated using simulations and the
selection and estimation method is applied to an empirical study concerning the
effect of BMI on diastolic blood pressure.
( 2
min )
Most neural compression models are trained on large datasets of images or
videos in order to generalize to unseen data. Such generalization typically
requires large and expressive architectures with a high decoding complexity.
Here we introduce C3, a neural compression method with strong rate-distortion
(RD) performance that instead overfits a small model to each image or video
separately. The resulting decoding complexity of C3 can be an order of
magnitude lower than neural baselines with similar RD performance. C3 builds on
COOL-CHIC (Ladune et al.) and makes several simple and effective improvements
for images. We further develop new methodology to apply C3 to videos. On the
CLIC2020 image benchmark, we match the RD performance of VTM, the reference
implementation of the H.266 codec, with less than 3k MACs/pixel for decoding.
On the UVG video benchmark, we match the RD performance of the Video
Compression Transformer (Mentzer et al.), a well-established neural video
codec, with less than 5k MACs/pixel for decoding.
( 2
min )
This paper presents a method for finding a sparse representation of Barron
functions. Specifically, given an $L^2$ function $f$, the inverse scale space
flow is used to find a sparse measure $\mu$ minimising the $L^2$ loss between
the Barron function associated to the measure $\mu$ and the function $f$. The
convergence properties of this method are analysed in an ideal setting and in
the cases of measurement noise and sampling bias. In an ideal setting the
objective decreases strictly monotone in time to a minimizer with
$\mathcal{O}(1/t)$, and in the case of measurement noise or sampling bias the
optimum is achieved up to a multiplicative or additive constant. This
convergence is preserved on discretization of the parameter space, and the
minimizers on increasingly fine discretizations converge to the optimum on the
full parameter space.
( 2
min )
Physics-informed neural networks (PINNs) constitute a flexible approach to
both finding solutions and identifying parameters of partial differential
equations. Most works on the topic assume noiseless data, or data contaminated
with weak Gaussian noise. We show that the standard PINN framework breaks down
in case of non-Gaussian noise. We give a way of resolving this fundamental
issue and we propose to jointly train an energy-based model (EBM) to learn the
correct noise distribution. We illustrate the improved performance of our
approach using multiple examples.
( 2
min )
The Street View House Numbers (SVHN) dataset is a popular benchmark dataset
in deep learning. Originally designed for digit classification tasks, the SVHN
dataset has been widely used as a benchmark for various other tasks including
generative modeling. However, with this work, we aim to warn the community
about an issue of the SVHN dataset as a benchmark for generative modeling
tasks: we discover that the official split into training set and test set of
the SVHN dataset are not drawn from the same distribution. We empirically show
that this distribution mismatch has little impact on the classification task
(which may explain why this issue has not been detected before), but it
severely affects the evaluation of probabilistic generative models, such as
Variational Autoencoders and diffusion models. As a workaround, we propose to
mix and re-split the official training and test set when SVHN is used for tasks
other than classification. We publish a new split and the indices we used to
create it at https://jzenn.github.io/svhn-remix/ .
( 2
min )
Toronto Pearson International Airport, in Ontario, Canada, is the country’s largest and busiest airport, serving some 50 million passengers each year. To enhance traveler experiences, the airport in June deployed the Zensors AI platform, which uses anonymized footage from existing security cameras to generate spatial data that helps optimize operations in real time. A member Read article >
( 7
min )
Move over, Merriam-Webster: Enterprises this year found plenty of candidates to add for word of the year. “Generative AI” and “generative pretrained transformer” were followed by terms such as “large language models” and “retrieval-augmented generation” (RAG) as whole industries turned their attention to transformative new technologies. Generative AI started the year as a blip on Read article >
( 17
min )
A new era of autonomous vehicle technology, known as AV 2.0, has emerged, marked by large, unified AI models that can control multiple parts of the vehicle stack, from perception and planning to control. Wayve, a London-based autonomous driving technology company, is leading the surf. In the latest episode of NVIDIA’s AI Podcast, host Katie Read article >
( 6
min )
Despite the seemingly unstoppable adoption of LLMs across industries, they are one component of a broader technology ecosystem that is powering the new AI wave. Many conversational AI use cases require LLMs like Llama 2, Flan T5, and Bloom to respond to user queries. These models rely on parametric knowledge to answer questions. The model […]
( 11
min )
Summarization is the technique of condensing sizable information into a compact and meaningful form, and stands as a cornerstone of efficient communication in our information-rich age. In a world full of data, summarizing long texts into brief summaries saves time and helps make informed decisions. Summarization condenses content, saving time and improving clarity by presenting […]
( 13
min )
Conversational AI has come a long way in recent years thanks to the rapid developments in generative AI, especially the performance improvements of large language models (LLMs) introduced by training techniques such as instruction fine-tuning and reinforcement learning from human feedback. When prompted correctly, these models can carry coherent conversations without any task-specific training data. […]
( 18
min )
This post is co-written with Stanislav Yeshchenko from Q4 Inc. Enterprises turn to Retrieval Augmented Generation (RAG) as a mainstream approach to building Q&A chatbots. We continue to see emerging challenges stemming from the nature of the assortment of datasets available. These datasets are often a mix of numerical and text data, at times structured, […]
( 18
min )
Explore the latest AI innovations aiming to advance the software development lifecycle. AdaptivePaste adapts and refines pasted code snippets in an IDE. InferFix automates bug detection and repair. Discover how.
The post Microsoft at ESEC/FSE 2023: AI techniques for a streamlined coding workflow appeared first on Microsoft Research.
( 10
min )
Research Focus: Using LLMs in a Rust-based formal verification framework; Rethinking network measurements with user feedback; 3D telemedicine using HoloportationTM communication technology could enhance overseas surgical visits.
The post Research Focus: Week of December 4, 2023 appeared first on Microsoft Research.
( 9
min )
During 18 years of leadership, Evans established new R&D mission areas, strengthened ties to the MIT community, and increased inclusion and education efforts.
( 11
min )
The data-driven approach to robot control has been gathering pace rapidly,
yet generalization to unseen task domains remains a critical challenge. We
argue that the key to generalization is representations that are (i) rich
enough to capture all task-relevant information and (ii) invariant to
superfluous variability between the training and the test domains. We
experimentally study such a representation -- containing both depth and
semantic information -- for visual navigation and show that it enables a
control policy trained entirely in simulated indoor scenes to generalize to
diverse real-world environments, both indoors and outdoors. Further, we show
that our representation reduces the A-distance between the training and test
domains, improving the generalization error bound as a result. Our proposed
approach is scalable: the learned policy improves continuously, as the
foundation models that it exploits absorb more diverse data during
pre-training.
( 2
min )
Denoising is intuitively related to projection. Indeed, under the manifold
hypothesis, adding random noise is approximately equivalent to orthogonal
perturbation. Hence, learning to denoise is approximately learning to project.
In this paper, we use this observation to reinterpret denoising diffusion
models as approximate gradient descent applied to the Euclidean distance
function. We then provide straight-forward convergence analysis of the DDIM
sampler under simple assumptions on the projection-error of the denoiser.
Finally, we propose a new sampler based on two simple modifications to DDIM
using insights from our theoretical results. In as few as 5-10 function
evaluations, our sampler achieves state-of-the-art FID scores on pretrained
CIFAR-10 and CelebA models and can generate high quality samples on latent
diffusion models.
( 2
min )
This paper proposes a multiblock alternating direction method of multipliers
for solving a class of multiblock nonsmooth nonconvex optimization problem with
nonlinear coupling constraints. We employ a majorization minimization procedure
in the update of each block of the primal variables. Subsequential and global
convergence of the generated sequence to a critical point of the augmented
Lagrangian are proved. We also establish iteration complexity and provide
preliminary numerical results for the proposed algorithm.
( 2
min )
Signal Temporal Logic (STL) is a powerful framework for describing the
complex temporal and logical behaviour of the dynamical system. Numerous
studies have attempted to employ reinforcement learning to learn a controller
that enforces STL specifications; however, they have been unable to effectively
tackle the challenges of ensuring robust satisfaction in continuous state space
and maintaining tractability. In this paper, leveraging the concept of funnel
functions, we propose a tractable reinforcement learning algorithm to learn a
time-dependent policy for robust satisfaction of STL specification in
continuous state space. We demonstrate the utility of our approach on several
STL tasks using different environments.
( 2
min )
Hippocampal atrophy in Alzheimer's disease (AD) is asymmetric and spatially
inhomogeneous. While extensive work has been done on volume and shape analysis
of atrophy of the hippocampus in AD, less attention has been given to
hippocampal asymmetry specifically. Previous studies of hippocampal asymmetry
are limited to global volume or shape measures, which don't localize shape
asymmetry at the point level. In this paper, we propose to quantify localized
shape asymmetry by optimizing point correspondences between left and right
hippocampi within a subject, while simultaneously favoring a compact
statistical shape model of the entire sample. To account for related variables
that have impact on AD and healthy subject differences, we build linear models
with other confounding factors. Our results on the OASIS3 dataset demonstrate
that compared to using volumetric information, shape asymmetry reveals
fine-grained, localized differences that indicate the hippocampal regions of
most significant shape asymmetry in AD patients.
( 2
min )
This work introduces BRILLsson, a novel binary neural network-based
representation learning model for a broad range of non-semantic speech tasks.
We train the model with knowledge distillation from a large and real-valued
TRILLsson model with only a fraction of the dataset used to train TRILLsson.
The resulting BRILLsson models are only 2MB in size with a latency less than
8ms, making them suitable for deployment in low-resource devices such as
wearables. We evaluate BRILLsson on eight benchmark tasks (including but not
limited to spoken language identification, emotion recognition, health
condition diagnosis, and keyword spotting), and demonstrate that our proposed
ultra-light and low-latency models perform as well as large-scale models.
( 2
min )
This paper proposes a weakly-supervised machine learning-based approach
aiming at a tool to alert patients about possible respiratory diseases. Various
types of pathologies may affect the respiratory system, potentially leading to
severe diseases and, in certain cases, death. In general, effective prevention
practices are considered as major actors towards the improvement of the
patient's health condition. The proposed method strives to realize an easily
accessible tool for the automatic diagnosis of respiratory diseases.
Specifically, the method leverages Variational Autoencoder architectures
permitting the usage of training pipelines of limited complexity and relatively
small-sized datasets. Importantly, it offers an accuracy of 57 %, which is in
line with the existing strongly-supervised approaches.
( 2
min )
Information Extraction (IE) seeks to derive structured information from
unstructured texts, often facing challenges in low-resource scenarios due to
data scarcity and unseen classes. This paper presents a review of neural
approaches to low-resource IE from \emph{traditional} and \emph{LLM-based}
perspectives, systematically categorizing them into a fine-grained taxonomy.
Then we conduct empirical study on LLM-based methods compared with previous
state-of-the-art models, and discover that (1) well-tuned LMs are still
predominant; (2) tuning open-resource LLMs and ICL with GPT family is promising
in general; (3) the optimal LLM-based technical solution for low-resource IE
can be task-dependent. In addition, we discuss low-resource IE with LLMs,
highlight promising applications, and outline potential research directions.
This survey aims to foster understanding of this field, inspire new ideas, and
encourage widespread applications in both academia and industry.
( 2
min )
Since ChatGPT works so well, are we on the cusp of solving science with AI?
Is not AlphaFold2 suggestive that the potential of LLMs in biology and the
sciences more broadly is limitless? Can we use AI itself to bridge the lack of
data in the sciences in order to then train an AI? Herein we present a
discussion of these topics.
( 2
min )
When visualizing a high-dimensional dataset, dimension reduction techniques
are commonly employed which provide a single 2 dimensional view of the data. We
describe ENS-t-SNE: an algorithm for Embedding Neighborhoods Simultaneously
that generalizes the t-Stochastic Neighborhood Embedding approach. By using
different viewpoints in ENS-t-SNE's 3D embedding, one can visualize different
types of clusters within the same high-dimensional dataset. This enables the
viewer to see and keep track of the different types of clusters, which is
harder to do when providing multiple 2D embeddings, where corresponding points
cannot be easily identified. We illustrate the utility of ENS-t-SNE with
real-world applications and provide an extensive quantitative evaluation with
datasets of different types and sizes.
( 2
min )
Traditional partial differential equation (PDE) solvers can be
computationally expensive, which motivates the development of faster methods,
such as reduced-order-models (ROMs). We present GPLaSDI, a hybrid deep-learning
and Bayesian ROM. GPLaSDI trains an autoencoder on full-order-model (FOM) data
and simultaneously learns simpler equations governing the latent space. These
equations are interpolated with Gaussian Processes, allowing for uncertainty
quantification and active learning, even with limited access to the FOM solver.
Our framework is able to achieve up to 100,000 times speed-up and less than 7%
relative error on fluid mechanics problems.
( 2
min )
Training neural networks that require adversarial optimization, such as
generative adversarial networks (GANs) and unsupervised domain adaptations
(UDAs), suffers from instability. This instability problem comes from the
difficulty of the minimax optimization, and there have been various approaches
in GANs and UDAs to overcome this problem. In this study, we tackle this
problem theoretically through a functional analysis. Specifically, we show the
convergence property of the minimax problem by the gradient descent over the
infinite-dimensional spaces of continuous functions and probability measures
under certain conditions. Using this setting, we can discuss GANs and UDAs
comprehensively, which have been studied independently. In addition, we show
that the conditions necessary for the convergence property are interpreted as
stabilization techniques of adversarial training such as the spectral
normalization and the gradient penalty.
( 2
min )
Normative models in neuroimaging learn the brain patterns of healthy
population distribution and estimate how disease subjects like Alzheimer's
Disease (AD) deviate from the norm. Existing variational autoencoder
(VAE)-based normative models using multimodal neuroimaging data aggregate
information from multiple modalities by estimating product or averaging of
unimodal latent posteriors. This can often lead to uninformative joint latent
distributions which affects the estimation of subject-level deviations. In this
work, we addressed the prior limitations by adopting the
Mixture-of-Product-of-Experts (MoPoE) technique which allows better modelling
of the joint latent posterior. Our model labelled subjects as outliers by
calculating deviations from the multimodal latent space. Further, we identified
which latent dimensions and brain regions were associated with abnormal
deviations due to AD pathology.
( 2
min )
In 2023, online payment fraud cost the world US$48 billion. Businesses prioritize fighting payment fraud and minimizing its financial and reputational damage. In addition to monetary losses, payment fraud can damage a customer’s trust and loyalty, as well as increase the scrutiny from regulators and law enforcement. Organizations use machine learning to combat this growing… Read More »Decoding the Future: The Intersection of Advanced Analytics and Fraud Prevention in Revolutionizing Digital Payments
The post Decoding the Future: The Intersection of Advanced Analytics and Fraud Prevention in Revolutionizing Digital Payments appeared first on Data Science Central.
( 22
min )
Large language model (LLM) training has become increasingly popular over the last year with the release of several publicly available models such as Llama2, Falcon, and StarCoder. Customers are now training LLMs of unprecedented size ranging from 1 billion to over 175 billion parameters. Training these LLMs requires significant compute resources and time as hundreds […]
( 8
min )
Structured data, defined as data following a fixed pattern such as information stored in columns within databases, and unstructured data, which lacks a specific form or pattern like text, images, or social media posts, both continue to grow as they are produced and consumed by various organizations. For instance, according to International Data Corporation (IDC), […]
( 13
min )
The post describes how you can overcome the challenges of retaining data ownership and preserving data privacy while using LLMs by deploying Protopia AI’s Stained Glass Transform to protect your data. Protopia AI has partnered with AWS to deliver the critical component of data protection and ownership for secure and efficient enterprise adoption of generative AI. This post outlines the solution and demonstrates how it can be used in AWS for popular enterprise use cases like Retrieval Augmented Generation (RAG) and with state-of-the-art LLMs like Llama 2.
( 12
min )
Many patients in low- and middle-income countries rely on facilitated online health communities for information and support. Discover how large language models can assist the facilitators and boost outcomes.
The post Exploring LLMs’ potential to help facilitators enhance online healthcare communities appeared first on Microsoft Research.
( 10
min )
Cecily Morrison and Karolina Pakėnaitė are collaborators on a research prototype designed to help members of the blind community find their personal items. Learn how the work is advancing an approach to empower people to shape their own AI experiences.
The post Collaborators: Teachable AI with Cecily Morrison and Karolina Pakėnaitė appeared first on Microsoft Research.
( 28
min )
‘Tis the season for friends, family and beautifully rendered Santa animations from this week’s In the NVIDIA Studio artist, 3D expert Božo Balov.
( 7
min )
A new, data-driven approach could lead to better solutions for tricky optimization problems like global package routing or power grid operation.
( 9
min )
Based on the standard VMAF implementation we propose an implementation of
VMAF using PyTorch framework. For this implementation comparisons with the
standard (libvmaf) show the discrepancy $\lesssim 10^{-2}$ in VMAF units. We
investigate gradients computation when using VMAF as an objective function and
demonstrate that training using this function does not result in ill-behaving
gradients. The implementation is then used to train a preprocessing filter. It
is demonstrated that its performance is superior to the unsharp masking filter.
The resulting filter is also easy for implementation and can be applied in
video processing tasks for video copression improvement. This is confirmed by
the results of numerical experiments.
( 2
min )
We consider a setting where a population of artificial learners is given, and
the objective is to optimize aggregate measures of performance, under
constraints on training resources. The problem is motivated by the study of
peer learning in human educational systems. In this context, we study natural
knowledge diffusion processes in networks of interacting artificial learners.
By `natural', we mean processes that reflect human peer learning where the
students' internal state and learning process is mostly opaque, and the main
degree of freedom lies in the formation of peer learning groups by a
coordinator who can potentially evaluate the learners before assigning them to
peer groups. Among else, we empirically show that such processes indeed make
effective use of the training resources, and enable the design of modular
neural models that have the capacity to generalize without being prone to
overfitting noisy labels.
( 2
min )
In this paper we consider the numerical solution to the soft-margin support
vector machine optimization problem. This problem is typically solved using the
SMO algorithm, given the high computational complexity of traditional
optimization algorithms when dealing with large-scale kernel matrices. In this
work, we propose employing an NFFT-accelerated matrix-vector product using an
ANOVA decomposition for the feature space that is used within an interior point
method for the overall optimization problem. As this method requires the
solution of a linear system of saddle point form we suggest a preconditioning
approach that is based on low-rank approximations of the kernel matrix together
with a Krylov subspace solver. We compare the accuracy of the ANOVA-based
kernel with the default LIBSVM implementation. We investigate the performance
of the different preconditioners as well as the accuracy of the ANOVA kernel on
several large-scale datasets.
( 2
min )
In this paper, we aim to explore the use of uplink semantic communications
with the assistance of UAV in order to improve data collection effiicency for
metaverse users in remote areas. To reduce the time for uplink data collection
while balancing the trade-off between reconstruction quality and computational
energy cost, we propose a hybrid action reinforcement learning (RL) framework
to make decisions on semantic model scale, channel allocation, transmission
power, and UAV trajectory. The variables are classified into discrete type and
continuous type, which are optimized by two different RL agents to generate the
combined action. Simulation results indicate that the proposed hybrid action
reinforcement learning framework can effectively improve the efficiency of
uplink semantic data collection under different parameter settings and
outperforms the benchmark scenarios.
( 2
min )
Bug reports are an essential aspect of software development, and it is
crucial to identify and resolve them quickly to ensure the consistent
functioning of software systems. Retrieving similar bug reports from an
existing database can help reduce the time and effort required to resolve bugs.
In this paper, we compared the effectiveness of semantic textual similarity
methods for retrieving similar bug reports based on a similarity score. We
explored several embedding models such as TF-IDF (Baseline), FastText, Gensim,
BERT, and ADA. We used the Software Defects Data containing bug reports for
various software projects to evaluate the performance of these models. Our
experimental results showed that BERT generally outperformed the rest of the
models regarding recall, followed by ADA, Gensim, FastText, and TFIDF. Our
study provides insights into the effectiveness of different embedding methods
for retrieving similar bug reports and highlights the impact of selecting the
appropriate one for this task. Our code is available on GitHub.
( 2
min )
Extracting the rules of real-world multi-agent behaviors is a current
challenge in various scientific and engineering fields. Biological agents
independently have limited observation and mechanical constraints; however,
most of the conventional data-driven models ignore such assumptions, resulting
in lack of biological plausibility and model interpretability for behavioral
analyses. Here we propose sequential generative models with partial observation
and mechanical constraints in a decentralized manner, which can model agents'
cognition and body dynamics, and predict biologically plausible behaviors. We
formulate this as a decentralized multi-agent imitation-learning problem,
leveraging binary partial observation and decentralized policy models based on
hierarchical variational recurrent neural networks with physical and
biomechanical penalties. Using real-world basketball and soccer datasets, we
show the effectiveness of our method in terms of the constraint violations,
long-term trajectory prediction, and partial observation. Our approach can be
used as a multi-agent simulator to generate realistic trajectories using
real-world data.
( 2
min )
The Shapley value is widely regarded as a trustworthy attribution metric.
However, when people use Shapley values to explain the attribution of input
variables of a deep neural network (DNN), it usually requires a very high
computational cost to approximate relatively accurate Shapley values in
real-world applications. Therefore, we propose a novel network architecture,
the HarsanyiNet, which makes inferences on the input sample and simultaneously
computes the exact Shapley values of the input variables in a single forward
propagation. The HarsanyiNet is designed on the theoretical foundation that the
Shapley value can be reformulated as the redistribution of Harsanyi
interactions encoded by the network.
( 2
min )
Learning disentangled causal representations is a challenging problem that
has gained significant attention recently due to its implications for
extracting meaningful information for downstream tasks. In this work, we define
a new notion of causal disentanglement from the perspective of independent
causal mechanisms. We propose ICM-VAE, a framework for learning causally
disentangled representations supervised by causally related observed labels. We
model causal mechanisms using learnable flow-based diffeomorphic functions to
map noise variables to latent causal variables. Further, to promote the
disentanglement of causal factors, we propose a causal disentanglement prior
that utilizes the known causal structure to encourage learning a causally
factorized distribution in the latent space. Under relatively mild conditions,
we provide theoretical results showing the identifiability of causal factors
and mechanisms up to permutation and elementwise reparameterization. We
empirically demonstrate that our framework induces highly disentangled causal
factors, improves interventional robustness, and is compatible with
counterfactual generation.
( 2
min )
Empirical studies have widely demonstrated that neural networks are highly
sensitive to small, adversarial perturbations of the input. The worst-case
robustness against these so-called adversarial examples can be quantified by
the Lipschitz constant of the neural network. In this paper, we study upper and
lower bounds for the Lipschitz constant of random ReLU neural networks.
Specifically, we assume that the weights and biases follow a generalization of
the He initialization, where general symmetric distributions for the biases are
permitted. For shallow neural networks, we characterize the Lipschitz constant
up to an absolute numerical constant. For deep networks with fixed depth and
sufficiently large width, our established bounds differ by a factor that is
logarithmic in the width.
( 2
min )
In this paper, we put forth a novel framework (named ``RYU'') for the
construction of ``safe'' balls, i.e. regions that provably contain the dual
solution of a target optimization problem. We concentrate on the standard setup
where the cost function is the sum of two terms: a closed, proper, convex
Lipschitz-smooth function and a closed, proper, convex function. The RYU
framework is shown to generalize or improve upon all the results proposed in
the last decade for the considered family of optimization problems.
( 2
min )
Graph contrastive learning has shown great promise when labeled data is
scarce, but large unlabeled datasets are available. However, it often does not
take uncertainty estimation into account. We show that a variational Bayesian
neural network approach can be used to improve not only the uncertainty
estimates but also the downstream performance on semi-supervised
node-classification tasks. Moreover, we propose a new measure of uncertainty
for contrastive learning, that is based on the disagreement in likelihood due
to different positive samples.
( 2
min )
We present an efficient parameter-free approach for statistical learning from
corrupted training sets. We identify corrupted and non-corrupted samples using
latent Bernoulli variables, and therefore formulate the robust learning problem
as maximization of the likelihood where latent variables are marginalized out.
The resulting optimization problem is solved via variational inference using an
efficient Expectation-Maximization based method. The proposed approach improves
over the state-of-the-art by automatically inferring the corruption level and
identifying outliers, while adding minimal computational overhead. We
demonstrate our robust learning method on a wide variety of machine learning
tasks including online learning and deep learning where it exhibits ability to
adapt to different levels of noise and attain high prediction accuracy.
( 2
min )
Canonical Correlation Analysis (CCA) has been widely applied to jointly embed
multiple views of data in a maximally correlated latent space. However, the
alignment between various data perspectives, which is required by traditional
approaches, is unclear in many practical cases. In this work we propose a new
framework Aligned Canonical Correlation Analysis (ACCA), to address this
challenge by iteratively solving the alignment and multi-view embedding.
( 2
min )
This paper elucidates the challenges and opportunities inherent in
integrating data-driven methodologies into geotechnics, drawing inspiration
from the success of materials informatics. Highlighting the intricacies of soil
complexity, heterogeneity, and the lack of comprehensive data, the discussion
underscores the pressing need for community-driven database initiatives and
open science movements. By leveraging the transformative power of deep
learning, particularly in feature extraction from high-dimensional data and the
potential of transfer learning, we envision a paradigm shift towards a more
collaborative and innovative geotechnics field. The paper concludes with a
forward-looking stance, emphasizing the revolutionary potential brought about
by advanced computational tools like large language models in reshaping
geotechnics informatics.
( 2
min )
This paper aims to define, quantify, and analyze the feature complexity that
is learned by a DNN. We propose a generic definition for the feature
complexity. Given the feature of a certain layer in the DNN, our method
disentangles feature components of different complexity orders from the
feature. We further design a set of metrics to evaluate the reliability, the
effectiveness, and the significance of over-fitting of these feature
components. Furthermore, we successfully discover a close relationship between
the feature complexity and the performance of DNNs. As a generic mathematical
tool, the feature complexity and the proposed metrics can also be used to
analyze the success of network compression and knowledge distillation.
( 2
min )
Extracting the rules of real-world multi-agent behaviors is a current
challenge in various scientific and engineering fields. Biological agents
independently have limited observation and mechanical constraints; however,
most of the conventional data-driven models ignore such assumptions, resulting
in lack of biological plausibility and model interpretability for behavioral
analyses. Here we propose sequential generative models with partial observation
and mechanical constraints in a decentralized manner, which can model agents'
cognition and body dynamics, and predict biologically plausible behaviors. We
formulate this as a decentralized multi-agent imitation-learning problem,
leveraging binary partial observation and decentralized policy models based on
hierarchical variational recurrent neural networks with physical and
biomechanical penalties. Using real-world basketball and soccer datasets, we
show the effectiveness of our method in terms of the constraint violations,
long-term trajectory prediction, and partial observation. Our approach can be
used as a multi-agent simulator to generate realistic trajectories using
real-world data.
( 2
min )
To enhance the gaming experience, studios and developers spend tremendous effort creating photorealistic, immersive in-game environments. But non-playable characters (NPCs) often get left behind. Many behave in ways that lack depth and realism, making their interactions repetitive and forgettable. Inworld AI is changing the game by using generative AI to drive NPC behaviors that are Read article >
( 6
min )
This is a guest post co-authored by Nafi Ahmet Turgut, Hasan Burak Yel, and Damla Şentürk from Getir. Established in 2015, Getir has positioned itself as the trailblazer in the sphere of ultrafast grocery delivery. This innovative tech company has revolutionized the last-mile delivery segment with its compelling offering of “groceries in minutes.” With a […]
( 7
min )
The recent upheavals at OpenAI and OpenAI’s Chief Scientist’s apprehensions regarding the “safety” of AI have ignited a fresh wave of concerns and fears about the march towards Artificial General Intelligence (AGI) and “Super Intelligence.” AI safety concerns the development of AI systems aligned with human values and do not cause harm to humans. Some… Read More »A Different AI Scenario: AI and Justice in a Brave New World – Part 1
The post A Different AI Scenario: AI and Justice in a Brave New World – Part 1 appeared first on Data Science Central.
( 22
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )